Creating Vector Databases
Learn how to create and configure vector databases for storing embeddings generated by data pipelines.
Overview
Vector databases store embeddings (numerical vector representations of text) that enable semantic search capabilities in FoundationaLLM. Before you can index data through a pipeline, you must create and configure a vector database resource.
Prerequisites
- Access to FoundationaLLM Management Portal
- Required permissions:
FoundationaLLM.VectorDatabase/vectorDatabases/write - An Azure AI Search instance or other supported vector database service deployed
- API endpoint configuration for the vector database service
What is a Vector Database?
A vector database in FoundationaLLM is a resource that:
- Represents a connection to a vector storage service (e.g., Azure AI Search)
- Defines the index schema and configuration
- Stores vector embeddings generated by data pipelines
- Enables semantic search by agents
Creating a Vector Database
Step 1: Navigate to Vector Databases
- Open the FoundationaLLM Management Portal
- Navigate to Data → Vector Databases
- Click Create New Vector Database
Step 2: Configure Basic Settings
Name: A unique identifier for the vector database
- Use descriptive names (e.g.,
product-docs-search,customer-support-kb) - Follow naming convention: lowercase with hyphens
- Example:
company-knowledge-base
Display Name: Human-readable name shown in the UI
- Example: "Company Knowledge Base"
Description: Purpose and contents of the vector database
- Document what data will be stored
- Note any specific use cases
Step 3: Configure Vector Database Type
Azure AI Search (Primary supported option):
{
"name": "product-docs-search",
"display_name": "Product Documentation Search",
"description": "Vector database for product documentation",
"type": "AzureAISearch",
"configuration": {
"api_endpoint_configuration_object_id": "instances/{instanceId}/providers/FoundationaLLM.Configuration/apiEndpointConfigurations/AzureAISearch",
"index_name": "product-docs",
"index_partition_name": "main"
}
}
Step 4: Configure API Endpoint
The api_endpoint_configuration_object_id references an API Endpoint Configuration resource that contains:
- Service URL
- Authentication credentials
- Connection settings
Create API Endpoint Configuration (if not exists):
- Navigate to FLLM Platform → Configuration → API Endpoints
- Create a new Azure AI Search endpoint
- Provide the service URL and authentication key
Step 5: Configure Index Settings
Index Name: The name of the search index in the vector database service
- Must be unique within the service
- Use lowercase letters, numbers, and hyphens
- Example:
product-docs,support-kb
Index Partition Name: Logical partition within the index
- Allows multiple datasets in one physical index
- Enables filtering by partition during search
- Default:
mainordefault - Use different partition names for multi-tenant scenarios
Step 6: Configure Index Schema
Define the fields and their properties:
Common Field Types:
content: The text contentcontent_vector: The embedding vectormetadata: Document metadata (title, author, date, etc.)source: Origin of the contentpartition: Partition identifier
Vector Field Configuration:
- Dimensions: Must match the embedding model dimensions (e.g., 1536, 2048, 3072)
- Algorithm: HNSW (Hierarchical Navigable Small World) is recommended
- Distance Metric: Cosine similarity is most common
Example schema configuration:
{
"fields": [
{
"name": "id",
"type": "string",
"key": true
},
{
"name": "content",
"type": "string",
"searchable": true
},
{
"name": "content_vector",
"type": "Collection(Edm.Single)",
"dimensions": 2048,
"vectorSearchConfiguration": "default"
},
{
"name": "partition",
"type": "string",
"filterable": true
}
]
}
Step 7: Review and Create
- Review all settings
- Click Create
- Wait for the vector database resource to be provisioned
Best Practices
Naming Conventions
- Environment-specific names:
dev-product-docs,prod-product-docs - Purpose-based names:
support-kb,technical-docs,product-catalog - Avoid generic names like
index1ordatabase
Index Design
Single vs. Multiple Indexes:
- Single index with partitions: Simpler management, shared capacity
- Multiple indexes: Better isolation, independent scaling
Partition Strategy:
- Per-tenant: Use partition for multi-tenant isolation
- Per-dataset: Separate different types of content
- Per-environment: Dev, staging, production in different partitions
Performance Considerations
Vector Dimensions:
- 1536 dimensions: Good balance for most use cases (text-embedding-ada-002)
- 2048 dimensions: Higher quality, more storage (text-embedding-3-large default)
- 3072 dimensions: Maximum quality, requires more resources
Index Size Planning:
- Plan for document count and growth rate
- Consider retention policies
- Monitor index size and performance
Security
Access Control:
- Use managed identities when possible
- Rotate API keys regularly
- Limit network access with firewall rules
Data Protection:
- Enable encryption at rest
- Use private endpoints for sensitive data
- Implement data retention policies
Common Scenarios
Scenario 1: Simple Knowledge Base
Single index, single partition:
- Index:
company-kb - Partition:
main - Use case: General company knowledge base
Scenario 2: Multi-Tenant Application
Single index, partition per tenant:
- Index:
app-data - Partitions:
tenant-a,tenant-b,tenant-c - Use case: SaaS application with tenant isolation
Scenario 3: Multiple Data Types
Multiple indexes:
- Index 1:
product-docs(technical documentation) - Index 2:
support-kb(support articles) - Index 3:
company-policies(internal policies) - Use case: Different search experiences per data type
Troubleshooting
Vector Database Creation Fails
Problem: Unable to create vector database resource Solutions:
- Verify you have required permissions
- Check that the name is unique
- Ensure API endpoint configuration exists and is valid
Connection Issues
Problem: Cannot connect to vector database service Solutions:
- Verify API endpoint configuration credentials
- Check network connectivity
- Ensure firewall rules allow access
- Verify the service URL is correct
Index Not Found
Problem: Index doesn't exist in the service Solutions:
- Manually create the index in Azure AI Search
- Or configure auto-index creation in the vector database settings
- Verify index name matches exactly (case-sensitive)
Schema Mismatch
Problem: Data doesn't match index schema Solutions:
- Verify vector dimensions match embedding model
- Check all required fields are defined
- Ensure field types are correct