Pipeline Versioning and Snapshots
Learn about pipeline versioning, automatic snapshots, and configuration management.
Overview
Pipeline versioning helps you:
- Track configuration changes over time
- Compare pipeline versions
- Migrate pipelines between environments
- Maintain audit trails
- Understand pipeline execution history
Understanding Pipeline Snapshots
What is a Snapshot?
A snapshot is an automatic, point-in-time copy of a pipeline configuration that is created when the pipeline runs. Snapshots include:
- Pipeline definition
- Stage configurations
- Trigger settings
- Parameter defaults
- Plugin references
Important: Snapshots are automatically created by the system when a pipeline execution starts. They are not user-manageable via the Management Portal.
Why Snapshots Exist
Snapshots serve a critical purpose in data pipeline execution:
- Freeze Configuration: When a pipeline run starts, a snapshot freezes the pipeline definition
- Long-Running Stability: Since pipelines can run for extended periods (hours or days), the snapshot ensures the execution uses a consistent configuration
- Change Isolation: If the pipeline definition is modified while a run is in progress, the running instance continues with the original configuration from the snapshot
- Audit Trail: Snapshots provide a record of exactly what configuration was used for each pipeline run
When Snapshots Are Created
Automatic Creation (system-managed):
- At the start of every pipeline execution
- Automatically when "Run" is triggered (manual or scheduled)
- Before the first work item is processed
- Linked to the specific pipeline run ID
Snapshot Contents
{
"snapshot_id": "run-2024-01-15-093045",
"created_at": "2024-01-15T09:30:45Z",
"run_id": "pipeline-run-12345",
"pipeline_definition": {
"name": "customer-data-pipeline",
"version": "1.2.0",
"data_source": { ... },
"starting_stages": [ ... ],
"triggers": [ ... ]
}
}
Snapshot Lifecycle
Automatic Creation
User triggers pipeline run
↓
System creates snapshot
↓
Pipeline executes with snapshot
↓
Snapshot persists with run history
No User Management Required
Note: Unlike some systems, FoundationaLLM snapshots are not created, deleted, or managed by users. They are purely system-managed artifacts tied to pipeline runs.
Viewing Pipeline Run Snapshots
Accessing Snapshot Information
While you cannot manage snapshots directly, you can view them through pipeline run details:
- Navigate to Data Pipeline Runs
- Select a specific pipeline run
- View run details which include the snapshot reference
- See the frozen configuration that was used for that specific run
| Field | Description |
|---|---|
| Run ID | Unique identifier for the pipeline run |
| Snapshot ID | Associated snapshot identifier |
| Created | Date and time of execution |
| Configuration | The pipeline definition used (from snapshot) |
Comparing Pipeline Configurations
Since snapshots are tied to runs, you can compare configurations by looking at different pipeline runs:
- Select two different pipeline runs
- View their respective configurations
- Manually compare the differences in:
- Stage configurations
- Parameter values
- Plugin versions
- Trigger settings
Why This Matters:
- Helps diagnose why results differ between runs
- Shows configuration evolution over time
- Aids in troubleshooting issues
Pipeline Versioning Best Practices
Managing Configuration Changes
Since snapshots are automatic, focus on managing your pipeline definitions:
Version Control Approach:
- Treat pipeline definitions like code
- Document changes in version control (Git)
- Use meaningful commit messages
- Tag stable releases
- Maintain changelog
Change Management:
- Test changes in development environment first
- Document what changed and why
- Validate with sample data
- Monitor first production runs closely
- Keep configuration backup (export JSON)
Reverting Configuration Changes
If you need to revert a pipeline to a previous configuration:
Manual Reversion Process:
- Export current configuration (for backup)
- Locate previous working configuration:
- From version control system
- From exported JSON backups
- From previous pipeline run details
- Update pipeline definition manually
- Test thoroughly before production use
Why No Automatic Rollback:
- Snapshots are tied to runs, not configuration management
- Manual reversion ensures deliberate, tested changes
- Reduces risk of accidental configuration loss
Pipeline Configuration Versioning
Semantic Versioning
Follow versioning convention for pipeline definitions:
Format: MAJOR.MINOR.PATCH
Example: 1.2.3
MAJOR: Breaking changes (1.0.0 → 2.0.0)
MINOR: New features (1.1.0 → 1.2.0)
PATCH: Bug fixes (1.2.1 → 1.2.2)
Version Lifecycle
Development → Testing → Staging → Production
v1.0.0 v1.0.0 v1.0.0 v1.0.0
After changes:
Development → Testing → Staging → Production
v1.1.0 v1.0.0 v1.0.0 v1.0.0
After validation:
Development → Testing → Staging → Production
v1.1.0 v1.1.0 v1.1.0 v1.0.0
After deployment:
Development → Testing → Staging → Production
v1.2.0 v1.1.0 v1.1.0 v1.1.0
Version Documentation
Document important versions:
Latest: Current production version
Stable: Last known good configuration
Beta: Testing new features in development
Archived: No longer in use
Environment Migration
Export Pipeline
Export configuration for migration:
GET /instances/{instanceId}/providers/FoundationaLLM.DataPipeline/dataPipelines/{pipelineName}/export
Response:
{
"pipeline_definition": { ... },
"dependencies": {
"plugins": ["TextExtraction", "Embedding"],
"data_sources": ["azure-data-lake"],
"configurations": ["AzureAISearch"]
}
}
Import Pipeline
Import to new environment:
POST /instances/{targetInstanceId}/providers/FoundationaLLM.DataPipeline/dataPipelines/import
Content-Type: application/json
{
"pipeline_definition": { ... },
"name": "customer-data-pipeline",
"environment": "production",
"overwrite": false
}
Migration Checklist
Before migrating:
- [ ] Export pipeline configuration
- [ ] Document dependencies
- [ ] Verify plugins available in target
- [ ] Check data source access
- [ ] Test with sample data
- [ ] Update instance-specific values
- [ ] Review security settings
- [ ] Test triggers
- [ ] Validate integration points
- [ ] Create rollback plan
Environment-Specific Changes
Update for target environment:
{
"development": {
"data_source": "dev-storage",
"index": "dev-index",
"schedule": null
},
"production": {
"data_source": "prod-storage",
"index": "prod-index",
"schedule": "0 6 * * *"
}
}
Change Management
Change Request Process
Document Change:
Change: Update embedding model Reason: Improve search quality Impact: 20% slower, better accuracy Rollback Plan: Export current config before changeExport Current Configuration: Save configuration backup
Test in Dev: Validate in development
Review: Get approval if needed
Deploy: Apply to production
Validate: Confirm success
Document: Record outcome and pipeline run ID
Change History
Track all changes:
| Date | Version | Change | By | Snapshot |
|---|---|---|---|---|
| 2024-01-20 | 1.3.0 | Add safety stage | alice | snap-456 |
| 2024-01-15 | 1.2.0 | Update embedding | bob | snap-123 |
| 2024-01-10 | 1.1.0 | Increase chunks | alice | snap-789 |
Best Practices
Configuration Management Strategy
Change Frequency:
- Production: Test changes thoroughly before deployment
- Staging: Regular validation environment
- Development: Iterate and test freely
Documentation Quality:
Good: "Updated embedding model for Q4 2024 feature release"
Bad: "change"
Include:
- What changed
- Why it changed
- Expected impact
- Test results
Version Control Integration
Treat pipelines like code:
Git commit: "Update embedding to 3-large for better quality"
Pipeline version: 1.2.0
Pipeline run: run-2024-01-15-093045
Link in commit message:
"Deployed to production - see run run-2024-01-15-093045"
Testing Before Deployment
Test Plan:
- Deploy to development
- Run with test dataset
- Validate results
- Compare metrics
- Get approval
- Deploy to staging
- Monitor for 24 hours
- Deploy to production
Change Evaluation Criteria
Monitor after deployment:
- Error rate should stay <1%
- Performance should match expectations
- Data quality validated
- No critical failures
If issues detected:
- Export current (problematic) configuration for analysis
- Manually revert to previous working configuration
- Investigate root cause
- Test fix in development before redeploying
Troubleshooting
Configuration Update Issues
Problem: Pipeline configuration changes not applied
Solutions:
- Verify you have appropriate permissions
- Check for validation errors in UI
- Review API response for errors
- Ensure pipeline not currently running
Pipeline Run References Missing
Problem: Cannot find specific pipeline run details
Solutions:
- Check run retention period
- Verify run ID is correct
- Check permissions to view runs
- Confirm run completed successfully
Configuration Export/Import Fails
Problem: Cannot export or import pipeline configuration
Solutions:
- Verify permissions
- Check JSON format validity
- Review instance-specific references
- Ensure target environment has required dependencies
Version Conflicts
Problem: Configuration version mismatch
Solutions:
- Increment version number appropriately
- Use different naming convention
- Check for concurrent modifications
- Review version control system
- Check for duplicates
- Review version control