Table of Contents

Creating Vector Databases

Learn how to create and configure vector databases for storing embeddings generated by data pipelines.

Overview

Vector databases store embeddings (numerical vector representations of text) that enable semantic search capabilities in FoundationaLLM. Before you can index data through a pipeline, you must create and configure a vector database resource.

Prerequisites

  • Access to FoundationaLLM Management Portal
  • Required permissions: FoundationaLLM.VectorDatabase/vectorDatabases/write
  • An Azure AI Search instance or other supported vector database service deployed
  • API endpoint configuration for the vector database service

What is a Vector Database?

A vector database in FoundationaLLM is a resource that:

  • Represents a connection to a vector storage service (e.g., Azure AI Search)
  • Defines the index schema and configuration
  • Stores vector embeddings generated by data pipelines
  • Enables semantic search by agents

Creating a Vector Database

Step 1: Navigate to Vector Databases

  1. Open the FoundationaLLM Management Portal
  2. Navigate to DataVector Databases
  3. Click Create New Vector Database

Step 2: Configure Basic Settings

Name: A unique identifier for the vector database

  • Use descriptive names (e.g., product-docs-search, customer-support-kb)
  • Follow naming convention: lowercase with hyphens
  • Example: company-knowledge-base

Display Name: Human-readable name shown in the UI

  • Example: "Company Knowledge Base"

Description: Purpose and contents of the vector database

  • Document what data will be stored
  • Note any specific use cases

Step 3: Configure Vector Database Type

Azure AI Search (Primary supported option):

{
  "name": "product-docs-search",
  "display_name": "Product Documentation Search",
  "description": "Vector database for product documentation",
  "type": "AzureAISearch",
  "configuration": {
    "api_endpoint_configuration_object_id": "instances/{instanceId}/providers/FoundationaLLM.Configuration/apiEndpointConfigurations/AzureAISearch",
    "index_name": "product-docs",
    "index_partition_name": "main"
  }
}

Step 4: Configure API Endpoint

The api_endpoint_configuration_object_id references an API Endpoint Configuration resource that contains:

  • Service URL
  • Authentication credentials
  • Connection settings

Create API Endpoint Configuration (if not exists):

  1. Navigate to FLLM PlatformConfigurationAPI Endpoints
  2. Create a new Azure AI Search endpoint
  3. Provide the service URL and authentication key

Step 5: Configure Index Settings

Index Name: The name of the search index in the vector database service

  • Must be unique within the service
  • Use lowercase letters, numbers, and hyphens
  • Example: product-docs, support-kb

Index Partition Name: Logical partition within the index

  • Allows multiple datasets in one physical index
  • Enables filtering by partition during search
  • Default: main or default
  • Use different partition names for multi-tenant scenarios

Step 6: Configure Index Schema

Define the fields and their properties:

Common Field Types:

  • content: The text content
  • content_vector: The embedding vector
  • metadata: Document metadata (title, author, date, etc.)
  • source: Origin of the content
  • partition: Partition identifier

Vector Field Configuration:

  • Dimensions: Must match the embedding model dimensions (e.g., 1536, 2048, 3072)
  • Algorithm: HNSW (Hierarchical Navigable Small World) is recommended
  • Distance Metric: Cosine similarity is most common

Example schema configuration:

{
  "fields": [
    {
      "name": "id",
      "type": "string",
      "key": true
    },
    {
      "name": "content",
      "type": "string",
      "searchable": true
    },
    {
      "name": "content_vector",
      "type": "Collection(Edm.Single)",
      "dimensions": 2048,
      "vectorSearchConfiguration": "default"
    },
    {
      "name": "partition",
      "type": "string",
      "filterable": true
    }
  ]
}

Step 7: Review and Create

  1. Review all settings
  2. Click Create
  3. Wait for the vector database resource to be provisioned

Best Practices

Naming Conventions

  • Environment-specific names: dev-product-docs, prod-product-docs
  • Purpose-based names: support-kb, technical-docs, product-catalog
  • Avoid generic names like index1 or database

Index Design

Single vs. Multiple Indexes:

  • Single index with partitions: Simpler management, shared capacity
  • Multiple indexes: Better isolation, independent scaling

Partition Strategy:

  • Per-tenant: Use partition for multi-tenant isolation
  • Per-dataset: Separate different types of content
  • Per-environment: Dev, staging, production in different partitions

Performance Considerations

Vector Dimensions:

  • 1536 dimensions: Good balance for most use cases (text-embedding-ada-002)
  • 2048 dimensions: Higher quality, more storage (text-embedding-3-large default)
  • 3072 dimensions: Maximum quality, requires more resources

Index Size Planning:

  • Plan for document count and growth rate
  • Consider retention policies
  • Monitor index size and performance

Security

Access Control:

  • Use managed identities when possible
  • Rotate API keys regularly
  • Limit network access with firewall rules

Data Protection:

  • Enable encryption at rest
  • Use private endpoints for sensitive data
  • Implement data retention policies

Common Scenarios

Scenario 1: Simple Knowledge Base

Single index, single partition:

  • Index: company-kb
  • Partition: main
  • Use case: General company knowledge base

Scenario 2: Multi-Tenant Application

Single index, partition per tenant:

  • Index: app-data
  • Partitions: tenant-a, tenant-b, tenant-c
  • Use case: SaaS application with tenant isolation

Scenario 3: Multiple Data Types

Multiple indexes:

  • Index 1: product-docs (technical documentation)
  • Index 2: support-kb (support articles)
  • Index 3: company-policies (internal policies)
  • Use case: Different search experiences per data type

Troubleshooting

Vector Database Creation Fails

Problem: Unable to create vector database resource Solutions:

  • Verify you have required permissions
  • Check that the name is unique
  • Ensure API endpoint configuration exists and is valid

Connection Issues

Problem: Cannot connect to vector database service Solutions:

  • Verify API endpoint configuration credentials
  • Check network connectivity
  • Ensure firewall rules allow access
  • Verify the service URL is correct

Index Not Found

Problem: Index doesn't exist in the service Solutions:

  • Manually create the index in Azure AI Search
  • Or configure auto-index creation in the vector database settings
  • Verify index name matches exactly (case-sensitive)

Schema Mismatch

Problem: Data doesn't match index schema Solutions:

  • Verify vector dimensions match embedding model
  • Check all required fields are defined
  • Ensure field types are correct