Table of Contents

Optimizing Pipeline Performance

Learn how to tune and optimize data pipeline performance for faster processing and lower costs.

Overview

Performance optimization helps you:

  • Reduce pipeline execution time
  • Lower cloud resource costs
  • Process more data in less time
  • Improve user experience
  • Scale efficiently

Performance Fundamentals

Key Metrics

Metric Description Target
Throughput Items processed per minute Maximize
Latency Time from start to searchable Minimize
Resource Usage CPU, memory, storage consumed Optimize
Cost Total cloud resource costs Minimize
Success Rate Percentage of successful runs >99%

Performance Bottlenecks

Common bottlenecks in data pipelines:

  1. I/O Bound: Reading/writing data
  2. CPU Bound: Text processing, embedding
  3. Network Bound: API calls, remote services
  4. Memory Bound: Large documents, batching
  5. Concurrency: Limited parallelism

Optimizing Each Stage

Data Source Stage

Bottleneck: Reading files from storage

Optimization Strategies:

  1. Use Efficient Filters:

    {
      "Folders": ["/specific-folder"],  // Not "/"
      "FilePattern": "*.pdf"  // If supported
    }
    
  2. Incremental Processing:

    • Process only changed files
    • Track last processed timestamp
    • Use frequent scheduled triggers for near-real-time updates
  3. Regional Proximity:

    • Collocate storage with pipeline workers
    • Use same Azure region
    • Reduce network latency

Before/After Example:

Before: Processing 10,000 files from all folders
Time: 120 minutes

After: Processing only /recent folder (500 files)
Time: 6 minutes
Improvement: 95% faster

Extraction Stage

Bottleneck: Converting documents to text

Optimization Strategies:

  1. Batch Size Tuning:

    Small batches (10): More overhead, predictable memory
    Large batches (100): Less overhead, more memory
    Optimal: 50-100 documents per batch
    
  2. File Size Handling:

    • Skip or split files >100MB
    • Process large files separately
    • Use streaming for huge files
  3. Format-Specific:

    PDF: Pre-optimize with reduced DPI
    DOCX: Remove embedded objects
    Images: Resize before processing
    
Format Typical Speed Optimization
Plain text 1000 docs/min Already fast
DOCX 200 docs/min Good
PDF (text) 100 docs/min Consider parallel
PDF (OCR) 10 docs/min Most intensive
Images 20 docs/min Resize first

Partitioning Stage

Bottleneck: Tokenization and chunking

Optimization Strategies:

  1. Reduce Chunk Count:

    Strategy 1: Larger chunks
    Before: 400 tokens → 1000 chunks
    After: 600 tokens → 667 chunks
    Reduction: 33% fewer chunks to process
    
    Trade-off: May reduce search precision
    
  2. Optimize Overlap:

    Minimal overlap: 50 tokens (faster, less context)
    Standard overlap: 100 tokens (balanced)
    High overlap: 150 tokens (better context, slower)
    
  3. Choose Right Strategy:

    Token: Faster, good for most cases
    Semantic: Slower, better for complex documents
    
    Recommendation: Use Token unless quality issues
    

Performance Comparison:

Document: 10,000 tokens

Token Strategy (400 tokens, 100 overlap):
- Chunks: 25
- Time: 0.5 seconds

Semantic Strategy:
- Chunks: 28 (variable size)
- Time: 2.0 seconds

Embedding Stage

Bottleneck: API calls to embedding model

Optimization Strategies:

  1. Model Selection:

    text-embedding-3-small:
    - Speed: 200 chunks/min
    - Dimensions: 1536
    - Quality: Good
    - Cost: $
    
    text-embedding-3-large:
    - Speed: 150 chunks/min
    - Dimensions: 3072
    - Quality: Better
    - Cost: $$$
    
    Recommendation: Start with small, upgrade if needed
    
  2. Dimension Optimization:

    Higher dimensions = Better accuracy + Slower + More storage
    Lower dimensions = Faster + Less storage + Lower accuracy
    
    Sweet spot: 1024 or 1536 dimensions
    
  3. Batch Size Tuning:

    API limits per request:
    - Max tokens: 8191 per request
    - Optimal batch: 50-100 chunks
    - Balance: Throughput vs. failure risk
    
  4. Parallel Requests:

    Sequential: 1 request at a time
    Time: 100 seconds for 1000 chunks
    
    Parallel (10 concurrent):
    Time: 15 seconds for 1000 chunks
    Improvement: 85% faster
    

Cost Optimization:

Processing 1M tokens:

text-embedding-3-small: $0.02
text-embedding-3-large: $0.13

Monthly cost for 100M tokens:
small: $2,000
large: $13,000

Potential savings: $11,000/month by using small model

Indexing Stage

Bottleneck: Writing to vector database

Optimization Strategies:

  1. Batch Uploads:

    Single uploads: 100 items/min
    Batch uploads (100): 1000 items/min
    Improvement: 10x faster
    
  2. Index Partitioning:

    Single large index: Slower writes, complex management
    Partitioned indexes: Faster writes, easier management
    
    Example:
    - customer-data-2024-01
    - customer-data-2024-02
    - etc.
    
  3. Optimize Field Count:

    Minimal fields: id, content, vector, metadata
    Excessive fields: Every possible attribute
    
    More fields = Slower indexing + More storage
    

Parallel Processing

Worker Scaling

Vertical Scaling (more powerful workers):

Before: 2 CPU, 4GB RAM
After: 4 CPU, 8GB RAM
Throughput: 1.5-2x improvement
Cost: 2x

Horizontal Scaling (more workers):

Before: 1 worker
After: 4 workers
Throughput: 3-4x improvement
Cost: 4x

Note: Near-linear scaling for independent work items

Concurrent Stages

Enable parallel execution where possible:

Sequential:
Extract → Partition → Embed → Index
Total: 100 minutes

Parallel (where possible):
Extract → [Partition + Embed in parallel] → Index
Total: 70 minutes

Queue Management

Optimize work item distribution:

Single queue: Simple, potential bottleneck
Multiple queues: Complex, better throughput

Recommendation: Use priority queues
- High: Small files, quick processing
- Normal: Standard documents
- Low: Large files, batch jobs

Resource Optimization

Memory Management

Symptoms of Memory Issues:

  • Out of memory errors
  • Slow processing
  • Worker crashes

Solutions:

  1. Reduce Batch Sizes:

    Current: 100 items/batch
    Reduced: 50 items/batch
    Memory usage: 50% reduction
    
  2. Process Large Files Separately:

    Standard pipeline: Files <10MB
    Large file pipeline: Files >10MB
    Separate processing paths
    
  3. Enable Garbage Collection:

    Periodic cleanup of unused objects
    Reduces memory pressure
    

CPU Optimization

High CPU Scenarios:

  • Text extraction (OCR)
  • Semantic partitioning
  • Complex transformations

Solutions:

  1. Schedule During Off-Peak:

    Peak hours: 9 AM - 5 PM
    Off-peak: Nights and weekends
    
    Schedule heavy pipelines: 2 AM - 6 AM
    
  2. Throttle Processing:

    Max concurrency: 10 items
    Reduces CPU spikes
    Smoother resource usage
    

Storage Optimization

Minimize Storage Costs:

  1. Cleanup Strategies:

    - Delete old pipeline runs after 30 days
    - Archive logs to cold storage
    - Remove failed run artifacts
    
  2. Compress Data:

    Before: Raw text stored
    After: Compressed text
    Savings: 60-80% storage reduction
    

Network Optimization

Reduce API Calls

Caching:

Before: Embed same text multiple times
After: Cache embeddings, reuse
Reduction: 50% fewer API calls

Batching:

Before: 1 API call per item
After: 1 API call per 100 items
Reduction: 99% fewer calls

Connection Pooling

Without pooling: Create new connection each time
With pooling: Reuse connections
Latency reduction: 50-100ms per request

Regional Deployment

Cross-region latency: 100-200ms
Same-region latency: 5-10ms

Improvement: 95% latency reduction
Critical for high-volume processing

Configuration Tuning

High Speed (lower quality):

{
  "partitioning": {
    "strategy": "Token",
    "size": 600,
    "overlap": 50
  },
  "embedding": {
    "model": "text-embedding-3-small",
    "dimensions": 1024
  },
  "batch_size": 100
}

Balanced (recommended):

{
  "partitioning": {
    "strategy": "Token",
    "size": 400,
    "overlap": 100
  },
  "embedding": {
    "model": "text-embedding-3-large",
    "dimensions": 1536
  },
  "batch_size": 50
}

High Quality (slower):

{
  "partitioning": {
    "strategy": "Semantic",
    "size": 500,
    "overlap": 150
  },
  "embedding": {
    "model": "text-embedding-3-large",
    "dimensions": 3072
  },
  "batch_size": 25
}

Monitoring Performance

Key Indicators

Track These Metrics:

- Documents processed per minute
- Average chunk processing time
- API response times
- Error rates
- Resource utilization

Benchmarking

Establish Baselines:

1. Run pipeline with known dataset
2. Record all metrics
3. Make single change
4. Compare results
5. Keep if improved, revert if worse

A/B Testing

Pipeline A: Current configuration
Pipeline B: Optimized configuration

Run both with same data
Compare:
- Speed
- Quality
- Cost
- Success rate

Choose best overall performer

Cost Optimization

Cost Breakdown

Typical pipeline costs:

Embedding API: 60%
Compute (workers): 25%
Storage: 10%
Networking: 5%

Reduction Strategies

  1. Optimize Embedding:

    Use smaller model: 50% cost reduction
    Reduce dimensions: 30% cost reduction
    Cache results: 20-50% cost reduction
    
  2. Right-Size Workers:

    Overprovisioned: Wasted resources
    Underprovisioned: Poor performance
    
    Monitor utilization, adjust monthly
    
  3. Schedule Efficiently:

    Run during off-peak pricing periods
    Batch similar work together
    Avoid unnecessary runs
    

Best Practices

Development Phase

  1. Start Small: Test with 100 documents
  2. Measure Baseline: Establish performance metrics
  3. Optimize Incrementally: Change one thing at a time
  4. Validate Quality: Ensure optimizations don't hurt results

Production Phase

  1. Monitor Continuously: Track key metrics
  2. Set Alerts: Notify on degradation
  3. Regular Reviews: Monthly performance analysis
  4. Capacity Planning: Anticipate growth

Iteration Cycle

1. Measure current performance
2. Identify bottleneck
3. Implement optimization
4. Measure improvement
5. Repeat

Common Pitfalls

Over-Optimization

Problem: Spending too much time on marginal gains

Solution: Focus on 80/20 rule - optimize biggest bottlenecks first

Quality Sacrifice

Problem: Too fast = poor search results

Solution: Always validate search quality after optimization

Premature Scaling

Problem: Scaling before identifying bottleneck

Solution: Measure first, scale second

Ignoring Costs

Problem: Fast but expensive

Solution: Balance performance with cost