Troubleshooting Data Pipelines

Learn how to diagnose and resolve common data pipeline issues.

Overview

This guide helps you troubleshoot data pipeline problems through systematic diagnosis and resolution. Follow the troubleshooting workflow to identify and fix issues efficiently.

Troubleshooting Workflow

Identify the Problem: Determine what's not working
Gather Information: Collect logs, error messages, and context
Analyze Symptoms: Match symptoms to known issues
Apply Solutions: Follow recommended fixes
Verify Resolution: Confirm the issue is resolved
Document: Record the problem and solution

Common Issues by Category

Pipeline Execution Issues

Pipeline Won't Start

Symptoms:

Run button disabled or does nothing
Error: "Cannot start pipeline"
Pipeline stays in "Pending" status

Common Causes:

Cause	Check	Solution
Pipeline inactive	Pipeline status	Activate the pipeline
Insufficient permissions	User role	Request Operator or Administrator role
Another run active	Pipeline runs list	Wait for completion or cancel
Missing trigger parameters	Trigger configuration	Add required parameters
System maintenance	Platform status	Wait and retry

Detailed Diagnosis:

Check Pipeline Status:

Navigate to Data Pipelines → Find your pipeline
Status should show "Active" not "Inactive"

Verify Permissions:

Check your assigned roles
Need: PipelineOperator or PipelineAdministrator

Review Active Runs:

Go to Data Pipeline Runs
Filter by your pipeline name
Check for "Running" status

Pipeline Fails Immediately

Symptoms:

Pipeline starts but fails within seconds
Status changes directly to "Failed"
No data processed

Common Causes:

1. Data Source Connection Issues

Symptoms:

Error: "Unable to connect to data source"
Error: "Access denied"

Solutions:

Verify data source configuration
Check authentication credentials
Test network connectivity
Verify permissions on source

2. Missing Configuration

Symptoms:

Error: "Required parameter not found"
Error: "Invalid configuration"

Solutions:

Review trigger parameter values
Verify all required parameters provided
Check parameter naming convention
Validate parameter types

3. Plugin Not Found

Symptoms:

Error: "Plugin not registered"
Error: "Unknown plugin"

Solutions:

Verify plugin object ID
Check plugin deployment
Confirm plugin version compatibility

Pipeline Hangs or Times Out

Symptoms:

Pipeline runs for extended period
No progress updates
Eventually times out

Common Causes:

Cause	Typical Duration	Solution
Large dataset	Hours	Normal, monitor progress
Network issues	Varies	Check connectivity
Resource contention	Varies	Check system load
Infinite loop	Forever	Cancel and review config

Diagnostic Steps:

Check Expected Duration:
- Review similar past runs
- Calculate based on data volume
- Compare against baselines
Monitor Progress:
- Check items processed
- Review stage completion
- Look for incremental updates
Inspect Logs:
- Look for repeated errors
- Check for warnings
- Identify stuck stage

Stage-Specific Issues

Text Extraction Fails

Symptoms:

Error: "Unable to extract text"
Files skipped
Empty content output

Common Causes:

1. Unsupported File Format

Solution:

Verify file extension matches supported types:
- PDF: .pdf
- Word: .docx (not .doc)
- Excel: .xlsx (not .xls)
- PowerPoint: .pptx (not .ppt)

2. Corrupted Files

Solution:

- Test file opens normally
- Try re-downloading source file
- Check file size isn't zero
- Verify file isn't password-protected

3. Large File Issues

Solution:

- Check file size limits
- Increase memory allocation
- Split large files
- Use streaming extraction if available

Partitioning Produces Wrong Results

Symptoms:

Chunks too large or too small
Text cut mid-sentence
Overlaps incorrect

Diagnosis:

Review Parameters:

{
  "PartitionSizeTokens": 400,  // Target size
  "PartitionOverlapTokens": 100  // Overlap
}

Check Token Counting:
- Different models count tokens differently
- Verify tokenizer matches embedding model

Solutions:

Problem	Current Setting	Recommended Setting
Chunks too large	800+ tokens	300-500 tokens
Chunks too small	<200 tokens	400-600 tokens
Poor overlap	<50 tokens	100-150 tokens
Excessive overlap	>200 tokens	50-100 tokens

Embedding Failures

Symptoms:

Error: "Embedding failed"
Error: "Rate limit exceeded"
Timeout errors

Common Causes:

1. API Rate Limits

Symptoms:

Error: "429 Too Many Requests"
Intermittent failures

Solutions:

Reduce batch size
Add retry logic
Check quota limits
Consider slower processing

2. Invalid Model Name

Symptoms:

Error: "Model not found"
All embedding fails

Solution:

Verify model name exactly matches:
- text-embedding-3-large
- text-embedding-3-small
- text-embedding-ada-002

Common typos:
✗ text-embedding-3-Large (wrong case)
✗ text-embedding-large (missing version)
✓ text-embedding-3-large (correct)

3. Text Too Long

Symptoms:

Error: "Input too long"
Specific chunks fail

Solutions:

Reduce partition size
Check for malformed chunks
Verify token limits for model

Indexing Issues

Symptoms:

Data doesn't appear in index
Partial data indexed
Search doesn't find content

Common Causes:

1. Index Not Found

Symptoms:

Error: "Index does not exist"

Solutions:

1. Verify index name spelling
2. Check index exists in Azure AI Search
3. Confirm API endpoint configuration
4. Verify permissions on index

2. Index Full

Symptoms:

Error: "Insufficient storage"
Partial success

Solutions:

1. Check Azure AI Search quota
2. Review index size
3. Consider index partitioning
4. Clean up old data

3. Schema Mismatch

Symptoms:

Error: "Field not found"
Data transformation fails

Solutions:

1. Review index schema
2. Verify field names match
3. Check data types
4. Update schema if needed

Data Quality Issues

Missing Data

Symptoms:

Expected files not processed
Gaps in indexed content
Fewer items than source

Diagnosis Checklist:

[ ] Check data source filter/path
[ ] Verify file permissions
[ ] Review error logs for skipped files
[ ] Confirm trigger parameter values
[ ] Check for duplicate detection

Investigation:

Compare Source vs. Processed:

Source: 1000 files
Processed: 950 files
Missing: 50 files → Check logs for reasons

Review Pipeline Logs:
- Look for "skipped" messages
- Check for unsupported formats
- Identify permission errors

Incorrect Data

Symptoms:

Text doesn't match source
Garbled characters
Wrong encoding

Common Causes:

1. Encoding Issues

Symptoms:

Special characters display wrong
Non-English text garbled

Solutions:

Verify source file encoding
Check UTF-8 support
Test with sample files

2. Extraction Errors

Symptoms:

PDF text out of order
Tables not parsed correctly
Images missing

Solutions:

Review PDF structure
Check extraction plugin version
Consider OCR for images
Validate output samples

Performance Issues

Slow Processing

Symptoms:

Pipeline takes much longer than expected
Processing rate decreases over time
Timeouts occur

Performance Benchmarks:

Stage	Typical Rate	Slow If
Extraction	10-50 docs/min	<5 docs/min
Partitioning	100-500 chunks/min	<50 chunks/min
Embedding	50-200 chunks/min	<25 chunks/min
Indexing	100-1000 chunks/min	<50 chunks/min

Optimization Strategies:

Increase Batch Sizes (if memory allows):

Current: 10 items/batch
Recommended: 50-100 items/batch

Optimize Partitioning:

Reduce chunk size: 400 → 300 tokens
Reduces total chunks processed

Use Faster Embedding Model:

Current: text-embedding-3-large
Faster: text-embedding-3-small
(Trade-off: slightly lower quality)

Parallel Processing:
- Increase worker instances
- Enable parallel stages
- Use multiple pipelines for different data sets

High Resource Usage

Symptoms:

High memory consumption
CPU at 100%
Storage filling quickly

Solutions:

Memory:

- Reduce batch sizes
- Process smaller files
- Enable streaming if available
- Increase worker memory limits

CPU:

- Reduce parallelism
- Optimize chunk sizes
- Schedule during off-peak
- Consider async processing

Storage:

- Clean up old runs
- Archive completed data
- Implement retention policies
- Monitor growth trends

Log Analysis

Finding Logs

Navigate to Data Pipeline Runs
Click on specific run
View Execution Log or Details

Reading Log Messages

Log Levels:

ERROR: Failures requiring attention
WARNING: Issues that may cause problems
INFO: Normal operations
DEBUG: Detailed diagnostic information

Common Error Patterns:

ERROR: Connection timeout
→ Network or firewall issue

ERROR: Unauthorized access
→ Permission or authentication problem

ERROR: Resource not found
→ Configuration error (index, model, etc.)

ERROR: Invalid parameter value
→ Configuration or data format issue

WARNING: Retrying operation
→ Transient error, may resolve

WARNING: Deprecated feature
→ Update configuration

Error Message Reference

Error Code	Meaning	Solution
`DPS-001`	Data source unreachable	Check connectivity
`DPS-002`	Authentication failed	Verify credentials
`DPS-003`	Plugin error	Check plugin configuration
`DPS-004`	Timeout	Increase timeout or reduce load
`DPS-005`	Invalid configuration	Review settings

State Inspection

Checking Pipeline State

Configuration State:
- Review pipeline definition
- Verify all stages configured
- Check trigger parameters
Execution State:
- Check run status
- Review stage completion
- Inspect work item progress
Data State:
- Verify source data availability
- Check index contents
- Validate embedding quality

Recovery Procedures

Restarting Failed Pipeline

Review failure reason in logs
Fix underlying issue
Run pipeline again (full or incremental)
Monitor for successful completion

Cleaning Up Partial Runs

If pipeline failed mid-execution:
1. Identify what was processed
2. Determine if partial data acceptable
3. Clean up or complete processing
4. Document for audit trail

Rollback Procedures

For incorrect data indexed:

Stop any active runs
Remove incorrect data from index
Reprocess from source
Verify correct data loaded

Prevention Best Practices

Pre-Flight Checks

Before running pipeline:

[ ] Verify data source accessible
[ ] Check target has capacity
[ ] Review configuration
[ ] Test with small sample
[ ] Monitor first few minutes

Monitoring

Set up alerts for failures
Monitor run durations
Track success rates
Review logs regularly

Documentation

Document pipeline configurations
Record common issues and solutions
Maintain runbook
Keep change log

Getting Help

Information to Gather

When seeking support, provide:

Pipeline name and ID
Run ID and timestamp
Error messages (complete text)
Recent changes made
What you've tried

Escalation Path

Self-Service: Use this guide
Team Lead: Escalate if unresolved
Platform Team: For infrastructure issues
Vendor Support: For third-party components

Table of Contents

Troubleshooting Data Pipelines

Overview

Troubleshooting Workflow

Common Issues by Category

Pipeline Execution Issues

Pipeline Won't Start

Pipeline Fails Immediately

Pipeline Hangs or Times Out

Stage-Specific Issues

Text Extraction Fails

Partitioning Produces Wrong Results

Embedding Failures

Indexing Issues

Data Quality Issues

Missing Data

Incorrect Data

Performance Issues

Slow Processing

High Resource Usage

Log Analysis

Finding Logs

Reading Log Messages

Error Message Reference

State Inspection

Checking Pipeline State

Recovery Procedures

Restarting Failed Pipeline

Cleaning Up Partial Runs

Rollback Procedures

Prevention Best Practices

Pre-Flight Checks

Monitoring

Documentation

Getting Help

Information to Gather

Escalation Path

Related Topics