Optimizing Pipeline Performance

Learn how to tune and optimize data pipeline performance for faster processing and lower costs.

Overview

Performance optimization helps you:

Reduce pipeline execution time
Lower cloud resource costs
Process more data in less time
Improve user experience
Scale efficiently

Performance Fundamentals

Key Metrics

Metric	Description	Target
Throughput	Items processed per minute	Maximize
Latency	Time from start to searchable	Minimize
Resource Usage	CPU, memory, storage consumed	Optimize
Cost	Total cloud resource costs	Minimize
Success Rate	Percentage of successful runs	>99%

Performance Bottlenecks

Common bottlenecks in data pipelines:

I/O Bound: Reading/writing data
CPU Bound: Text processing, embedding
Network Bound: API calls, remote services
Memory Bound: Large documents, batching
Concurrency: Limited parallelism

Optimizing Each Stage

Data Source Stage

Bottleneck: Reading files from storage

Optimization Strategies:

Use Efficient Filters:

{
  "Folders": ["/specific-folder"],  // Not "/"
  "FilePattern": "*.pdf"  // If supported
}

Incremental Processing:
- Process only changed files
- Track last processed timestamp
- Use frequent scheduled triggers for near-real-time updates
Regional Proximity:
- Collocate storage with pipeline workers
- Use same Azure region
- Reduce network latency

Before/After Example:

Before: Processing 10,000 files from all folders
Time: 120 minutes

After: Processing only /recent folder (500 files)
Time: 6 minutes
Improvement: 95% faster

Extraction Stage

Bottleneck: Converting documents to text

Optimization Strategies:

Batch Size Tuning:

Small batches (10): More overhead, predictable memory
Large batches (100): Less overhead, more memory
Optimal: 50-100 documents per batch

File Size Handling:
- Skip or split files >100MB
- Process large files separately
- Use streaming for huge files

Format-Specific:

PDF: Pre-optimize with reduced DPI
DOCX: Remove embedded objects
Images: Resize before processing

Format	Typical Speed	Optimization
Plain text	1000 docs/min	Already fast
DOCX	200 docs/min	Good
PDF (text)	100 docs/min	Consider parallel
PDF (OCR)	10 docs/min	Most intensive
Images	20 docs/min	Resize first

Partitioning Stage

Bottleneck: Tokenization and chunking

Optimization Strategies:

Reduce Chunk Count:

Strategy 1: Larger chunks
Before: 400 tokens → 1000 chunks
After: 600 tokens → 667 chunks
Reduction: 33% fewer chunks to process

Trade-off: May reduce search precision

Optimize Overlap:

Minimal overlap: 50 tokens (faster, less context)
Standard overlap: 100 tokens (balanced)
High overlap: 150 tokens (better context, slower)

Choose Right Strategy:

Token: Faster, good for most cases
Semantic: Slower, better for complex documents

Recommendation: Use Token unless quality issues

Performance Comparison:

Document: 10,000 tokens

Token Strategy (400 tokens, 100 overlap):
- Chunks: 25
- Time: 0.5 seconds

Semantic Strategy:
- Chunks: 28 (variable size)
- Time: 2.0 seconds

Embedding Stage

Bottleneck: API calls to embedding model

Optimization Strategies:

Model Selection:

text-embedding-3-small:
- Speed: 200 chunks/min
- Dimensions: 1536
- Quality: Good
- Cost: $

text-embedding-3-large:
- Speed: 150 chunks/min
- Dimensions: 3072
- Quality: Better
- Cost: $$$

Recommendation: Start with small, upgrade if needed

Dimension Optimization:

Higher dimensions = Better accuracy + Slower + More storage
Lower dimensions = Faster + Less storage + Lower accuracy

Sweet spot: 1024 or 1536 dimensions

Batch Size Tuning:

API limits per request:
- Max tokens: 8191 per request
- Optimal batch: 50-100 chunks
- Balance: Throughput vs. failure risk

Parallel Requests:

Sequential: 1 request at a time
Time: 100 seconds for 1000 chunks

Parallel (10 concurrent):
Time: 15 seconds for 1000 chunks
Improvement: 85% faster

Cost Optimization:

Processing 1M tokens:

text-embedding-3-small: $0.02
text-embedding-3-large: $0.13

Monthly cost for 100M tokens:
small: $2,000
large: $13,000

Potential savings: $11,000/month by using small model

Indexing Stage

Bottleneck: Writing to vector database

Optimization Strategies:

Batch Uploads:

Single uploads: 100 items/min
Batch uploads (100): 1000 items/min
Improvement: 10x faster

Index Partitioning:

Single large index: Slower writes, complex management
Partitioned indexes: Faster writes, easier management

Example:
- customer-data-2024-01
- customer-data-2024-02
- etc.

Optimize Field Count:

Minimal fields: id, content, vector, metadata
Excessive fields: Every possible attribute

More fields = Slower indexing + More storage

Parallel Processing

Worker Scaling

Vertical Scaling (more powerful workers):

Before: 2 CPU, 4GB RAM
After: 4 CPU, 8GB RAM
Throughput: 1.5-2x improvement
Cost: 2x

Horizontal Scaling (more workers):

Before: 1 worker
After: 4 workers
Throughput: 3-4x improvement
Cost: 4x

Note: Near-linear scaling for independent work items

Concurrent Stages

Enable parallel execution where possible:

Sequential:
Extract → Partition → Embed → Index
Total: 100 minutes

Parallel (where possible):
Extract → [Partition + Embed in parallel] → Index
Total: 70 minutes

Queue Management

Optimize work item distribution:

Single queue: Simple, potential bottleneck
Multiple queues: Complex, better throughput

Recommendation: Use priority queues
- High: Small files, quick processing
- Normal: Standard documents
- Low: Large files, batch jobs

Resource Optimization

Memory Management

Symptoms of Memory Issues:

Out of memory errors
Slow processing
Worker crashes

Solutions:

Reduce Batch Sizes:

Current: 100 items/batch
Reduced: 50 items/batch
Memory usage: 50% reduction

Process Large Files Separately:

Standard pipeline: Files <10MB
Large file pipeline: Files >10MB
Separate processing paths

Enable Garbage Collection:

Periodic cleanup of unused objects
Reduces memory pressure

CPU Optimization

High CPU Scenarios:

Text extraction (OCR)
Semantic partitioning
Complex transformations

Solutions:

Schedule During Off-Peak:

Peak hours: 9 AM - 5 PM
Off-peak: Nights and weekends

Schedule heavy pipelines: 2 AM - 6 AM

Throttle Processing:

Max concurrency: 10 items
Reduces CPU spikes
Smoother resource usage

Storage Optimization

Minimize Storage Costs:

Cleanup Strategies:

- Delete old pipeline runs after 30 days
- Archive logs to cold storage
- Remove failed run artifacts

Compress Data:

Before: Raw text stored
After: Compressed text
Savings: 60-80% storage reduction

Network Optimization

Reduce API Calls

Caching:

Before: Embed same text multiple times
After: Cache embeddings, reuse
Reduction: 50% fewer API calls

Batching:

Before: 1 API call per item
After: 1 API call per 100 items
Reduction: 99% fewer calls

Connection Pooling

Without pooling: Create new connection each time
With pooling: Reuse connections
Latency reduction: 50-100ms per request

Regional Deployment

Cross-region latency: 100-200ms
Same-region latency: 5-10ms

Improvement: 95% latency reduction
Critical for high-volume processing

Configuration Tuning

Recommended Profiles

High Speed (lower quality):

{
  "partitioning": {
    "strategy": "Token",
    "size": 600,
    "overlap": 50
  },
  "embedding": {
    "model": "text-embedding-3-small",
    "dimensions": 1024
  },
  "batch_size": 100
}

Balanced (recommended):

{
  "partitioning": {
    "strategy": "Token",
    "size": 400,
    "overlap": 100
  },
  "embedding": {
    "model": "text-embedding-3-large",
    "dimensions": 1536
  },
  "batch_size": 50
}

High Quality (slower):

{
  "partitioning": {
    "strategy": "Semantic",
    "size": 500,
    "overlap": 150
  },
  "embedding": {
    "model": "text-embedding-3-large",
    "dimensions": 3072
  },
  "batch_size": 25
}

Monitoring Performance

Key Indicators

Track These Metrics:

- Documents processed per minute
- Average chunk processing time
- API response times
- Error rates
- Resource utilization

Benchmarking

Establish Baselines:

1. Run pipeline with known dataset
2. Record all metrics
3. Make single change
4. Compare results
5. Keep if improved, revert if worse

A/B Testing

Pipeline A: Current configuration
Pipeline B: Optimized configuration

Run both with same data
Compare:
- Speed
- Quality
- Cost
- Success rate

Choose best overall performer

Cost Optimization

Cost Breakdown

Typical pipeline costs:

Embedding API: 60%
Compute (workers): 25%
Storage: 10%
Networking: 5%

Reduction Strategies

Optimize Embedding:

Use smaller model: 50% cost reduction
Reduce dimensions: 30% cost reduction
Cache results: 20-50% cost reduction

Right-Size Workers:

Overprovisioned: Wasted resources
Underprovisioned: Poor performance

Monitor utilization, adjust monthly

Schedule Efficiently:

Run during off-peak pricing periods
Batch similar work together
Avoid unnecessary runs

Best Practices

Development Phase

Start Small: Test with 100 documents
Measure Baseline: Establish performance metrics
Optimize Incrementally: Change one thing at a time
Validate Quality: Ensure optimizations don't hurt results

Production Phase

Monitor Continuously: Track key metrics
Set Alerts: Notify on degradation
Regular Reviews: Monthly performance analysis
Capacity Planning: Anticipate growth

Iteration Cycle

1. Measure current performance
2. Identify bottleneck
3. Implement optimization
4. Measure improvement
5. Repeat

Common Pitfalls

Over-Optimization

Problem: Spending too much time on marginal gains

Solution: Focus on 80/20 rule - optimize biggest bottlenecks first

Quality Sacrifice

Problem: Too fast = poor search results

Solution: Always validate search quality after optimization

Premature Scaling

Problem: Scaling before identifying bottleneck

Solution: Measure first, scale second

Ignoring Costs

Problem: Fast but expensive

Solution: Balance performance with cost

Table of Contents

Optimizing Pipeline Performance

Overview

Performance Fundamentals

Key Metrics

Performance Bottlenecks

Optimizing Each Stage

Data Source Stage

Extraction Stage

Partitioning Stage

Embedding Stage

Indexing Stage

Parallel Processing

Worker Scaling

Concurrent Stages

Queue Management

Resource Optimization

Memory Management

CPU Optimization

Storage Optimization

Network Optimization

Reduce API Calls

Connection Pooling

Regional Deployment

Configuration Tuning

Recommended Profiles

Monitoring Performance

Key Indicators

Benchmarking

A/B Testing

Cost Optimization

Cost Breakdown

Reduction Strategies

Best Practices

Development Phase

Production Phase

Iteration Cycle

Common Pitfalls

Over-Optimization

Quality Sacrifice

Premature Scaling

Ignoring Costs

Related Topics