Troubleshooting Data Pipelines
Learn how to diagnose and resolve common data pipeline issues.
Overview
This guide helps you troubleshoot data pipeline problems through systematic diagnosis and resolution. Follow the troubleshooting workflow to identify and fix issues efficiently.
Troubleshooting Workflow
- Identify the Problem: Determine what's not working
- Gather Information: Collect logs, error messages, and context
- Analyze Symptoms: Match symptoms to known issues
- Apply Solutions: Follow recommended fixes
- Verify Resolution: Confirm the issue is resolved
- Document: Record the problem and solution
Common Issues by Category
Pipeline Execution Issues
Pipeline Won't Start
Symptoms:
- Run button disabled or does nothing
- Error: "Cannot start pipeline"
- Pipeline stays in "Pending" status
Common Causes:
| Cause | Check | Solution |
|---|---|---|
| Pipeline inactive | Pipeline status | Activate the pipeline |
| Insufficient permissions | User role | Request Operator or Administrator role |
| Another run active | Pipeline runs list | Wait for completion or cancel |
| Missing trigger parameters | Trigger configuration | Add required parameters |
| System maintenance | Platform status | Wait and retry |
Detailed Diagnosis:
Check Pipeline Status:
Navigate to Data Pipelines → Find your pipeline Status should show "Active" not "Inactive"Verify Permissions:
Check your assigned roles Need: PipelineOperator or PipelineAdministratorReview Active Runs:
Go to Data Pipeline Runs Filter by your pipeline name Check for "Running" status
Pipeline Fails Immediately
Symptoms:
- Pipeline starts but fails within seconds
- Status changes directly to "Failed"
- No data processed
Common Causes:
1. Data Source Connection Issues
Symptoms:
- Error: "Unable to connect to data source"
- Error: "Access denied"
Solutions:
- Verify data source configuration
- Check authentication credentials
- Test network connectivity
- Verify permissions on source
2. Missing Configuration
Symptoms:
- Error: "Required parameter not found"
- Error: "Invalid configuration"
Solutions:
- Review trigger parameter values
- Verify all required parameters provided
- Check parameter naming convention
- Validate parameter types
3. Plugin Not Found
Symptoms:
- Error: "Plugin not registered"
- Error: "Unknown plugin"
Solutions:
- Verify plugin object ID
- Check plugin deployment
- Confirm plugin version compatibility
Pipeline Hangs or Times Out
Symptoms:
- Pipeline runs for extended period
- No progress updates
- Eventually times out
Common Causes:
| Cause | Typical Duration | Solution |
|---|---|---|
| Large dataset | Hours | Normal, monitor progress |
| Network issues | Varies | Check connectivity |
| Resource contention | Varies | Check system load |
| Infinite loop | Forever | Cancel and review config |
Diagnostic Steps:
Check Expected Duration:
- Review similar past runs
- Calculate based on data volume
- Compare against baselines
Monitor Progress:
- Check items processed
- Review stage completion
- Look for incremental updates
Inspect Logs:
- Look for repeated errors
- Check for warnings
- Identify stuck stage
Stage-Specific Issues
Text Extraction Fails
Symptoms:
- Error: "Unable to extract text"
- Files skipped
- Empty content output
Common Causes:
1. Unsupported File Format
Solution:
Verify file extension matches supported types:
- PDF: .pdf
- Word: .docx (not .doc)
- Excel: .xlsx (not .xls)
- PowerPoint: .pptx (not .ppt)
2. Corrupted Files
Solution:
- Test file opens normally
- Try re-downloading source file
- Check file size isn't zero
- Verify file isn't password-protected
3. Large File Issues
Solution:
- Check file size limits
- Increase memory allocation
- Split large files
- Use streaming extraction if available
Partitioning Produces Wrong Results
Symptoms:
- Chunks too large or too small
- Text cut mid-sentence
- Overlaps incorrect
Diagnosis:
Review Parameters:
{ "PartitionSizeTokens": 400, // Target size "PartitionOverlapTokens": 100 // Overlap }Check Token Counting:
- Different models count tokens differently
- Verify tokenizer matches embedding model
Solutions:
| Problem | Current Setting | Recommended Setting |
|---|---|---|
| Chunks too large | 800+ tokens | 300-500 tokens |
| Chunks too small | <200 tokens | 400-600 tokens |
| Poor overlap | <50 tokens | 100-150 tokens |
| Excessive overlap | >200 tokens | 50-100 tokens |
Embedding Failures
Symptoms:
- Error: "Embedding failed"
- Error: "Rate limit exceeded"
- Timeout errors
Common Causes:
1. API Rate Limits
Symptoms:
- Error: "429 Too Many Requests"
- Intermittent failures
Solutions:
- Reduce batch size
- Add retry logic
- Check quota limits
- Consider slower processing
2. Invalid Model Name
Symptoms:
- Error: "Model not found"
- All embedding fails
Solution:
Verify model name exactly matches:
- text-embedding-3-large
- text-embedding-3-small
- text-embedding-ada-002
Common typos:
✗ text-embedding-3-Large (wrong case)
✗ text-embedding-large (missing version)
✓ text-embedding-3-large (correct)
3. Text Too Long
Symptoms:
- Error: "Input too long"
- Specific chunks fail
Solutions:
- Reduce partition size
- Check for malformed chunks
- Verify token limits for model
Indexing Issues
Symptoms:
- Data doesn't appear in index
- Partial data indexed
- Search doesn't find content
Common Causes:
1. Index Not Found
Symptoms:
- Error: "Index does not exist"
Solutions:
1. Verify index name spelling
2. Check index exists in Azure AI Search
3. Confirm API endpoint configuration
4. Verify permissions on index
2. Index Full
Symptoms:
- Error: "Insufficient storage"
- Partial success
Solutions:
1. Check Azure AI Search quota
2. Review index size
3. Consider index partitioning
4. Clean up old data
3. Schema Mismatch
Symptoms:
- Error: "Field not found"
- Data transformation fails
Solutions:
1. Review index schema
2. Verify field names match
3. Check data types
4. Update schema if needed
Data Quality Issues
Missing Data
Symptoms:
- Expected files not processed
- Gaps in indexed content
- Fewer items than source
Diagnosis Checklist:
- [ ] Check data source filter/path
- [ ] Verify file permissions
- [ ] Review error logs for skipped files
- [ ] Confirm trigger parameter values
- [ ] Check for duplicate detection
Investigation:
Compare Source vs. Processed:
Source: 1000 files Processed: 950 files Missing: 50 files → Check logs for reasonsReview Pipeline Logs:
- Look for "skipped" messages
- Check for unsupported formats
- Identify permission errors
Incorrect Data
Symptoms:
- Text doesn't match source
- Garbled characters
- Wrong encoding
Common Causes:
1. Encoding Issues
Symptoms:
- Special characters display wrong
- Non-English text garbled
Solutions:
- Verify source file encoding
- Check UTF-8 support
- Test with sample files
2. Extraction Errors
Symptoms:
- PDF text out of order
- Tables not parsed correctly
- Images missing
Solutions:
- Review PDF structure
- Check extraction plugin version
- Consider OCR for images
- Validate output samples
Performance Issues
Slow Processing
Symptoms:
- Pipeline takes much longer than expected
- Processing rate decreases over time
- Timeouts occur
Performance Benchmarks:
| Stage | Typical Rate | Slow If |
|---|---|---|
| Extraction | 10-50 docs/min | <5 docs/min |
| Partitioning | 100-500 chunks/min | <50 chunks/min |
| Embedding | 50-200 chunks/min | <25 chunks/min |
| Indexing | 100-1000 chunks/min | <50 chunks/min |
Optimization Strategies:
Increase Batch Sizes (if memory allows):
Current: 10 items/batch Recommended: 50-100 items/batchOptimize Partitioning:
Reduce chunk size: 400 → 300 tokens Reduces total chunks processedUse Faster Embedding Model:
Current: text-embedding-3-large Faster: text-embedding-3-small (Trade-off: slightly lower quality)Parallel Processing:
- Increase worker instances
- Enable parallel stages
- Use multiple pipelines for different data sets
High Resource Usage
Symptoms:
- High memory consumption
- CPU at 100%
- Storage filling quickly
Solutions:
Memory:
- Reduce batch sizes
- Process smaller files
- Enable streaming if available
- Increase worker memory limits
CPU:
- Reduce parallelism
- Optimize chunk sizes
- Schedule during off-peak
- Consider async processing
Storage:
- Clean up old runs
- Archive completed data
- Implement retention policies
- Monitor growth trends
Log Analysis
Finding Logs
- Navigate to Data Pipeline Runs
- Click on specific run
- View Execution Log or Details
Reading Log Messages
Log Levels:
- ERROR: Failures requiring attention
- WARNING: Issues that may cause problems
- INFO: Normal operations
- DEBUG: Detailed diagnostic information
Common Error Patterns:
ERROR: Connection timeout
→ Network or firewall issue
ERROR: Unauthorized access
→ Permission or authentication problem
ERROR: Resource not found
→ Configuration error (index, model, etc.)
ERROR: Invalid parameter value
→ Configuration or data format issue
WARNING: Retrying operation
→ Transient error, may resolve
WARNING: Deprecated feature
→ Update configuration
Error Message Reference
| Error Code | Meaning | Solution |
|---|---|---|
DPS-001 |
Data source unreachable | Check connectivity |
DPS-002 |
Authentication failed | Verify credentials |
DPS-003 |
Plugin error | Check plugin configuration |
DPS-004 |
Timeout | Increase timeout or reduce load |
DPS-005 |
Invalid configuration | Review settings |
State Inspection
Checking Pipeline State
Configuration State:
- Review pipeline definition
- Verify all stages configured
- Check trigger parameters
Execution State:
- Check run status
- Review stage completion
- Inspect work item progress
Data State:
- Verify source data availability
- Check index contents
- Validate embedding quality
Recovery Procedures
Restarting Failed Pipeline
- Review failure reason in logs
- Fix underlying issue
- Run pipeline again (full or incremental)
- Monitor for successful completion
Cleaning Up Partial Runs
If pipeline failed mid-execution:
1. Identify what was processed
2. Determine if partial data acceptable
3. Clean up or complete processing
4. Document for audit trail
Rollback Procedures
For incorrect data indexed:
- Stop any active runs
- Remove incorrect data from index
- Reprocess from source
- Verify correct data loaded
Prevention Best Practices
Pre-Flight Checks
Before running pipeline:
- [ ] Verify data source accessible
- [ ] Check target has capacity
- [ ] Review configuration
- [ ] Test with small sample
- [ ] Monitor first few minutes
Monitoring
- Set up alerts for failures
- Monitor run durations
- Track success rates
- Review logs regularly
Documentation
- Document pipeline configurations
- Record common issues and solutions
- Maintain runbook
- Keep change log
Getting Help
Information to Gather
When seeking support, provide:
- Pipeline name and ID
- Run ID and timestamp
- Error messages (complete text)
- Recent changes made
- What you've tried
Escalation Path
- Self-Service: Use this guide
- Team Lead: Escalate if unresolved
- Platform Team: For infrastructure issues
- Vendor Support: For third-party components