Creating Vector Databases

Learn how to create and configure vector databases for storing embeddings generated by data pipelines.

Overview

Vector databases store embeddings (numerical vector representations of text) that enable semantic search capabilities in FoundationaLLM. Before you can index data through a pipeline, you must create and configure a vector database resource.

Prerequisites

Access to FoundationaLLM Management Portal
Required permissions: FoundationaLLM.VectorDatabase/vectorDatabases/write
An Azure AI Search instance or other supported vector database service deployed
API endpoint configuration for the vector database service

What is a Vector Database?

A vector database in FoundationaLLM is a resource that:

Represents a connection to a vector storage service (e.g., Azure AI Search)
Defines the index schema and configuration
Stores vector embeddings generated by data pipelines
Enables semantic search by agents

Creating a Vector Database

Step 1: Navigate to Vector Databases

Open the FoundationaLLM Management Portal
Navigate to Data → Vector Databases
Click Create New Vector Database

Step 2: Configure Basic Settings

Name: A unique identifier for the vector database

Use descriptive names (e.g., product-docs-search, customer-support-kb)
Follow naming convention: lowercase with hyphens
Example: company-knowledge-base

Display Name: Human-readable name shown in the UI

Example: "Company Knowledge Base"

Description: Purpose and contents of the vector database

Document what data will be stored
Note any specific use cases

Step 3: Configure Vector Database Type

Azure AI Search (Primary supported option):

{
  "name": "product-docs-search",
  "display_name": "Product Documentation Search",
  "description": "Vector database for product documentation",
  "type": "AzureAISearch",
  "configuration": {
    "api_endpoint_configuration_object_id": "instances/{instanceId}/providers/FoundationaLLM.Configuration/apiEndpointConfigurations/AzureAISearch",
    "index_name": "product-docs",
    "index_partition_name": "main"
  }
}

Step 4: Configure API Endpoint

The api_endpoint_configuration_object_id references an API Endpoint Configuration resource that contains:

Service URL
Authentication credentials
Connection settings

Create API Endpoint Configuration (if not exists):

Navigate to FLLM Platform → Configuration → API Endpoints
Create a new Azure AI Search endpoint
Provide the service URL and authentication key

Step 5: Configure Index Settings

Index Name: The name of the search index in the vector database service

Must be unique within the service
Use lowercase letters, numbers, and hyphens
Example: product-docs, support-kb

Index Partition Name: Logical partition within the index

Allows multiple datasets in one physical index
Enables filtering by partition during search
Default: main or default
Use different partition names for multi-tenant scenarios

Step 6: Configure Index Schema

Define the fields and their properties:

Common Field Types:

content: The text content
content_vector: The embedding vector
metadata: Document metadata (title, author, date, etc.)
source: Origin of the content
partition: Partition identifier

Vector Field Configuration:

Dimensions: Must match the embedding model dimensions (e.g., 1536, 2048, 3072)
Algorithm: HNSW (Hierarchical Navigable Small World) is recommended
Distance Metric: Cosine similarity is most common

Example schema configuration:

{
  "fields": [
    {
      "name": "id",
      "type": "string",
      "key": true
    },
    {
      "name": "content",
      "type": "string",
      "searchable": true
    },
    {
      "name": "content_vector",
      "type": "Collection(Edm.Single)",
      "dimensions": 2048,
      "vectorSearchConfiguration": "default"
    },
    {
      "name": "partition",
      "type": "string",
      "filterable": true
    }
  ]
}

Step 7: Review and Create

Review all settings
Click Create
Wait for the vector database resource to be provisioned

Best Practices

Naming Conventions

Environment-specific names: dev-product-docs, prod-product-docs
Purpose-based names: support-kb, technical-docs, product-catalog
Avoid generic names like index1 or database

Index Design

Single vs. Multiple Indexes:

Single index with partitions: Simpler management, shared capacity
Multiple indexes: Better isolation, independent scaling

Partition Strategy:

Per-tenant: Use partition for multi-tenant isolation
Per-dataset: Separate different types of content
Per-environment: Dev, staging, production in different partitions

Performance Considerations

Vector Dimensions:

1536 dimensions: Good balance for most use cases (text-embedding-ada-002)
2048 dimensions: Higher quality, more storage (text-embedding-3-large default)
3072 dimensions: Maximum quality, requires more resources

Index Size Planning:

Plan for document count and growth rate
Consider retention policies
Monitor index size and performance

Security

Access Control:

Use managed identities when possible
Rotate API keys regularly
Limit network access with firewall rules

Data Protection:

Enable encryption at rest
Use private endpoints for sensitive data
Implement data retention policies

Common Scenarios

Scenario 1: Simple Knowledge Base

Single index, single partition:

Index: company-kb
Partition: main
Use case: General company knowledge base

Scenario 2: Multi-Tenant Application

Single index, partition per tenant:

Index: app-data
Partitions: tenant-a, tenant-b, tenant-c
Use case: SaaS application with tenant isolation

Scenario 3: Multiple Data Types

Multiple indexes:

Index 1: product-docs (technical documentation)
Index 2: support-kb (support articles)
Index 3: company-policies (internal policies)
Use case: Different search experiences per data type

Troubleshooting

Vector Database Creation Fails

Problem: Unable to create vector database resource Solutions:

Verify you have required permissions
Check that the name is unique
Ensure API endpoint configuration exists and is valid

Connection Issues

Problem: Cannot connect to vector database service Solutions:

Verify API endpoint configuration credentials
Check network connectivity
Ensure firewall rules allow access
Verify the service URL is correct

Index Not Found

Problem: Index doesn't exist in the service Solutions:

Manually create the index in Azure AI Search
Or configure auto-index creation in the vector database settings
Verify index name matches exactly (case-sensitive)

Schema Mismatch

Problem: Data doesn't match index schema Solutions:

Verify vector dimensions match embedding model
Check all required fields are defined
Ensure field types are correct

Table of Contents