Optimizing Pipeline Performance
Learn how to tune and optimize data pipeline performance for faster processing and lower costs.
Overview
Performance optimization helps you:
- Reduce pipeline execution time
- Lower cloud resource costs
- Process more data in less time
- Improve user experience
- Scale efficiently
Performance Fundamentals
Key Metrics
| Metric | Description | Target |
|---|---|---|
| Throughput | Items processed per minute | Maximize |
| Latency | Time from start to searchable | Minimize |
| Resource Usage | CPU, memory, storage consumed | Optimize |
| Cost | Total cloud resource costs | Minimize |
| Success Rate | Percentage of successful runs | >99% |
Performance Bottlenecks
Common bottlenecks in data pipelines:
- I/O Bound: Reading/writing data
- CPU Bound: Text processing, embedding
- Network Bound: API calls, remote services
- Memory Bound: Large documents, batching
- Concurrency: Limited parallelism
Optimizing Each Stage
Data Source Stage
Bottleneck: Reading files from storage
Optimization Strategies:
Use Efficient Filters:
{ "Folders": ["/specific-folder"], // Not "/" "FilePattern": "*.pdf" // If supported }Incremental Processing:
- Process only changed files
- Track last processed timestamp
- Use frequent scheduled triggers for near-real-time updates
Regional Proximity:
- Collocate storage with pipeline workers
- Use same Azure region
- Reduce network latency
Before/After Example:
Before: Processing 10,000 files from all folders
Time: 120 minutes
After: Processing only /recent folder (500 files)
Time: 6 minutes
Improvement: 95% faster
Extraction Stage
Bottleneck: Converting documents to text
Optimization Strategies:
Batch Size Tuning:
Small batches (10): More overhead, predictable memory Large batches (100): Less overhead, more memory Optimal: 50-100 documents per batchFile Size Handling:
- Skip or split files >100MB
- Process large files separately
- Use streaming for huge files
Format-Specific:
PDF: Pre-optimize with reduced DPI DOCX: Remove embedded objects Images: Resize before processing
| Format | Typical Speed | Optimization |
|---|---|---|
| Plain text | 1000 docs/min | Already fast |
| DOCX | 200 docs/min | Good |
| PDF (text) | 100 docs/min | Consider parallel |
| PDF (OCR) | 10 docs/min | Most intensive |
| Images | 20 docs/min | Resize first |
Partitioning Stage
Bottleneck: Tokenization and chunking
Optimization Strategies:
Reduce Chunk Count:
Strategy 1: Larger chunks Before: 400 tokens → 1000 chunks After: 600 tokens → 667 chunks Reduction: 33% fewer chunks to process Trade-off: May reduce search precisionOptimize Overlap:
Minimal overlap: 50 tokens (faster, less context) Standard overlap: 100 tokens (balanced) High overlap: 150 tokens (better context, slower)Choose Right Strategy:
Token: Faster, good for most cases Semantic: Slower, better for complex documents Recommendation: Use Token unless quality issues
Performance Comparison:
Document: 10,000 tokens
Token Strategy (400 tokens, 100 overlap):
- Chunks: 25
- Time: 0.5 seconds
Semantic Strategy:
- Chunks: 28 (variable size)
- Time: 2.0 seconds
Embedding Stage
Bottleneck: API calls to embedding model
Optimization Strategies:
Model Selection:
text-embedding-3-small: - Speed: 200 chunks/min - Dimensions: 1536 - Quality: Good - Cost: $ text-embedding-3-large: - Speed: 150 chunks/min - Dimensions: 3072 - Quality: Better - Cost: $$$ Recommendation: Start with small, upgrade if neededDimension Optimization:
Higher dimensions = Better accuracy + Slower + More storage Lower dimensions = Faster + Less storage + Lower accuracy Sweet spot: 1024 or 1536 dimensionsBatch Size Tuning:
API limits per request: - Max tokens: 8191 per request - Optimal batch: 50-100 chunks - Balance: Throughput vs. failure riskParallel Requests:
Sequential: 1 request at a time Time: 100 seconds for 1000 chunks Parallel (10 concurrent): Time: 15 seconds for 1000 chunks Improvement: 85% faster
Cost Optimization:
Processing 1M tokens:
text-embedding-3-small: $0.02
text-embedding-3-large: $0.13
Monthly cost for 100M tokens:
small: $2,000
large: $13,000
Potential savings: $11,000/month by using small model
Indexing Stage
Bottleneck: Writing to vector database
Optimization Strategies:
Batch Uploads:
Single uploads: 100 items/min Batch uploads (100): 1000 items/min Improvement: 10x fasterIndex Partitioning:
Single large index: Slower writes, complex management Partitioned indexes: Faster writes, easier management Example: - customer-data-2024-01 - customer-data-2024-02 - etc.Optimize Field Count:
Minimal fields: id, content, vector, metadata Excessive fields: Every possible attribute More fields = Slower indexing + More storage
Parallel Processing
Worker Scaling
Vertical Scaling (more powerful workers):
Before: 2 CPU, 4GB RAM
After: 4 CPU, 8GB RAM
Throughput: 1.5-2x improvement
Cost: 2x
Horizontal Scaling (more workers):
Before: 1 worker
After: 4 workers
Throughput: 3-4x improvement
Cost: 4x
Note: Near-linear scaling for independent work items
Concurrent Stages
Enable parallel execution where possible:
Sequential:
Extract → Partition → Embed → Index
Total: 100 minutes
Parallel (where possible):
Extract → [Partition + Embed in parallel] → Index
Total: 70 minutes
Queue Management
Optimize work item distribution:
Single queue: Simple, potential bottleneck
Multiple queues: Complex, better throughput
Recommendation: Use priority queues
- High: Small files, quick processing
- Normal: Standard documents
- Low: Large files, batch jobs
Resource Optimization
Memory Management
Symptoms of Memory Issues:
- Out of memory errors
- Slow processing
- Worker crashes
Solutions:
Reduce Batch Sizes:
Current: 100 items/batch Reduced: 50 items/batch Memory usage: 50% reductionProcess Large Files Separately:
Standard pipeline: Files <10MB Large file pipeline: Files >10MB Separate processing pathsEnable Garbage Collection:
Periodic cleanup of unused objects Reduces memory pressure
CPU Optimization
High CPU Scenarios:
- Text extraction (OCR)
- Semantic partitioning
- Complex transformations
Solutions:
Schedule During Off-Peak:
Peak hours: 9 AM - 5 PM Off-peak: Nights and weekends Schedule heavy pipelines: 2 AM - 6 AMThrottle Processing:
Max concurrency: 10 items Reduces CPU spikes Smoother resource usage
Storage Optimization
Minimize Storage Costs:
Cleanup Strategies:
- Delete old pipeline runs after 30 days - Archive logs to cold storage - Remove failed run artifactsCompress Data:
Before: Raw text stored After: Compressed text Savings: 60-80% storage reduction
Network Optimization
Reduce API Calls
Caching:
Before: Embed same text multiple times
After: Cache embeddings, reuse
Reduction: 50% fewer API calls
Batching:
Before: 1 API call per item
After: 1 API call per 100 items
Reduction: 99% fewer calls
Connection Pooling
Without pooling: Create new connection each time
With pooling: Reuse connections
Latency reduction: 50-100ms per request
Regional Deployment
Cross-region latency: 100-200ms
Same-region latency: 5-10ms
Improvement: 95% latency reduction
Critical for high-volume processing
Configuration Tuning
Recommended Profiles
High Speed (lower quality):
{
"partitioning": {
"strategy": "Token",
"size": 600,
"overlap": 50
},
"embedding": {
"model": "text-embedding-3-small",
"dimensions": 1024
},
"batch_size": 100
}
Balanced (recommended):
{
"partitioning": {
"strategy": "Token",
"size": 400,
"overlap": 100
},
"embedding": {
"model": "text-embedding-3-large",
"dimensions": 1536
},
"batch_size": 50
}
High Quality (slower):
{
"partitioning": {
"strategy": "Semantic",
"size": 500,
"overlap": 150
},
"embedding": {
"model": "text-embedding-3-large",
"dimensions": 3072
},
"batch_size": 25
}
Monitoring Performance
Key Indicators
Track These Metrics:
- Documents processed per minute
- Average chunk processing time
- API response times
- Error rates
- Resource utilization
Benchmarking
Establish Baselines:
1. Run pipeline with known dataset
2. Record all metrics
3. Make single change
4. Compare results
5. Keep if improved, revert if worse
A/B Testing
Pipeline A: Current configuration
Pipeline B: Optimized configuration
Run both with same data
Compare:
- Speed
- Quality
- Cost
- Success rate
Choose best overall performer
Cost Optimization
Cost Breakdown
Typical pipeline costs:
Embedding API: 60%
Compute (workers): 25%
Storage: 10%
Networking: 5%
Reduction Strategies
Optimize Embedding:
Use smaller model: 50% cost reduction Reduce dimensions: 30% cost reduction Cache results: 20-50% cost reductionRight-Size Workers:
Overprovisioned: Wasted resources Underprovisioned: Poor performance Monitor utilization, adjust monthlySchedule Efficiently:
Run during off-peak pricing periods Batch similar work together Avoid unnecessary runs
Best Practices
Development Phase
- Start Small: Test with 100 documents
- Measure Baseline: Establish performance metrics
- Optimize Incrementally: Change one thing at a time
- Validate Quality: Ensure optimizations don't hurt results
Production Phase
- Monitor Continuously: Track key metrics
- Set Alerts: Notify on degradation
- Regular Reviews: Monthly performance analysis
- Capacity Planning: Anticipate growth
Iteration Cycle
1. Measure current performance
2. Identify bottleneck
3. Implement optimization
4. Measure improvement
5. Repeat
Common Pitfalls
Over-Optimization
Problem: Spending too much time on marginal gains
Solution: Focus on 80/20 rule - optimize biggest bottlenecks first
Quality Sacrifice
Problem: Too fast = poor search results
Solution: Always validate search quality after optimization
Premature Scaling
Problem: Scaling before identifying bottleneck
Solution: Measure first, scale second
Ignoring Costs
Problem: Fast but expensive
Solution: Balance performance with cost