Pipeline Versioning and Snapshots

Learn about pipeline versioning, automatic snapshots, and configuration management.

Overview

Pipeline versioning helps you:

Track configuration changes over time
Compare pipeline versions
Migrate pipelines between environments
Maintain audit trails
Understand pipeline execution history

Understanding Pipeline Snapshots

What is a Snapshot?

A snapshot is an automatic, point-in-time copy of a pipeline configuration that is created when the pipeline runs. Snapshots include:

Pipeline definition
Stage configurations
Trigger settings
Parameter defaults
Plugin references

Important: Snapshots are automatically created by the system when a pipeline execution starts. They are not user-manageable via the Management Portal.

Why Snapshots Exist

Snapshots serve a critical purpose in data pipeline execution:

Freeze Configuration: When a pipeline run starts, a snapshot freezes the pipeline definition
Long-Running Stability: Since pipelines can run for extended periods (hours or days), the snapshot ensures the execution uses a consistent configuration
Change Isolation: If the pipeline definition is modified while a run is in progress, the running instance continues with the original configuration from the snapshot
Audit Trail: Snapshots provide a record of exactly what configuration was used for each pipeline run

When Snapshots Are Created

Automatic Creation (system-managed):

At the start of every pipeline execution
Automatically when "Run" is triggered (manual or scheduled)
Before the first work item is processed
Linked to the specific pipeline run ID

Snapshot Contents

{
  "snapshot_id": "run-2024-01-15-093045",
  "created_at": "2024-01-15T09:30:45Z",
  "run_id": "pipeline-run-12345",
  "pipeline_definition": {
    "name": "customer-data-pipeline",
    "version": "1.2.0",
    "data_source": { ... },
    "starting_stages": [ ... ],
    "triggers": [ ... ]
  }
}

Snapshot Lifecycle

Automatic Creation

User triggers pipeline run
         ↓
System creates snapshot
         ↓
Pipeline executes with snapshot
         ↓
Snapshot persists with run history

No User Management Required

Note: Unlike some systems, FoundationaLLM snapshots are not created, deleted, or managed by users. They are purely system-managed artifacts tied to pipeline runs.

Viewing Pipeline Run Snapshots

Accessing Snapshot Information

While you cannot manage snapshots directly, you can view them through pipeline run details:

Navigate to Data Pipeline Runs
Select a specific pipeline run
View run details which include the snapshot reference
See the frozen configuration that was used for that specific run

Field	Description
Run ID	Unique identifier for the pipeline run
Snapshot ID	Associated snapshot identifier
Created	Date and time of execution
Configuration	The pipeline definition used (from snapshot)

Comparing Pipeline Configurations

Since snapshots are tied to runs, you can compare configurations by looking at different pipeline runs:

Select two different pipeline runs
View their respective configurations
Manually compare the differences in:
- Stage configurations
- Parameter values
- Plugin versions
- Trigger settings

Why This Matters:

Helps diagnose why results differ between runs
Shows configuration evolution over time
Aids in troubleshooting issues

Pipeline Versioning Best Practices

Managing Configuration Changes

Since snapshots are automatic, focus on managing your pipeline definitions:

Version Control Approach:

Treat pipeline definitions like code
Document changes in version control (Git)
Use meaningful commit messages
Tag stable releases
Maintain changelog

Change Management:

Test changes in development environment first
Document what changed and why
Validate with sample data
Monitor first production runs closely
Keep configuration backup (export JSON)

Reverting Configuration Changes

If you need to revert a pipeline to a previous configuration:

Manual Reversion Process:

Export current configuration (for backup)
Locate previous working configuration:
- From version control system
- From exported JSON backups
- From previous pipeline run details
Update pipeline definition manually
Test thoroughly before production use

Why No Automatic Rollback:

Snapshots are tied to runs, not configuration management
Manual reversion ensures deliberate, tested changes
Reduces risk of accidental configuration loss

Pipeline Configuration Versioning

Semantic Versioning

Follow versioning convention for pipeline definitions:

Format: MAJOR.MINOR.PATCH
Example: 1.2.3

MAJOR: Breaking changes (1.0.0 → 2.0.0)
MINOR: New features (1.1.0 → 1.2.0)
PATCH: Bug fixes (1.2.1 → 1.2.2)

Version Lifecycle

Development → Testing → Staging → Production
    v1.0.0      v1.0.0     v1.0.0      v1.0.0

After changes:
Development → Testing → Staging → Production
    v1.1.0      v1.0.0     v1.0.0      v1.0.0

After validation:
Development → Testing → Staging → Production
    v1.1.0      v1.1.0     v1.1.0      v1.0.0

After deployment:
Development → Testing → Staging → Production
    v1.2.0      v1.1.0     v1.1.0      v1.1.0

Version Documentation

Document important versions:

Latest: Current production version
Stable: Last known good configuration
Beta: Testing new features in development
Archived: No longer in use

Environment Migration

Export Pipeline

Export configuration for migration:

GET /instances/{instanceId}/providers/FoundationaLLM.DataPipeline/dataPipelines/{pipelineName}/export

Response:

{
  "pipeline_definition": { ... },
  "dependencies": {
    "plugins": ["TextExtraction", "Embedding"],
    "data_sources": ["azure-data-lake"],
    "configurations": ["AzureAISearch"]
  }
}

Import Pipeline

Import to new environment:

POST /instances/{targetInstanceId}/providers/FoundationaLLM.DataPipeline/dataPipelines/import
Content-Type: application/json

{
  "pipeline_definition": { ... },
  "name": "customer-data-pipeline",
  "environment": "production",
  "overwrite": false
}

Migration Checklist

Before migrating:

[ ] Export pipeline configuration
[ ] Document dependencies
[ ] Verify plugins available in target
[ ] Check data source access
[ ] Test with sample data
[ ] Update instance-specific values
[ ] Review security settings
[ ] Test triggers
[ ] Validate integration points
[ ] Create rollback plan

Environment-Specific Changes

Update for target environment:

{
  "development": {
    "data_source": "dev-storage",
    "index": "dev-index",
    "schedule": null
  },
  "production": {
    "data_source": "prod-storage",
    "index": "prod-index",
    "schedule": "0 6 * * *"
  }
}

Change Management

Change Request Process

Document Change:

Change: Update embedding model
Reason: Improve search quality
Impact: 20% slower, better accuracy
Rollback Plan: Export current config before change

Export Current Configuration: Save configuration backup
Test in Dev: Validate in development
Review: Get approval if needed
Deploy: Apply to production
Validate: Confirm success
Document: Record outcome and pipeline run ID

Change History

Track all changes:

Date	Version	Change	By	Snapshot
2024-01-20	1.3.0	Add safety stage	alice	snap-456
2024-01-15	1.2.0	Update embedding	bob	snap-123
2024-01-10	1.1.0	Increase chunks	alice	snap-789

Best Practices

Configuration Management Strategy

Change Frequency:

Production: Test changes thoroughly before deployment
Staging: Regular validation environment
Development: Iterate and test freely

Documentation Quality:

Good: "Updated embedding model for Q4 2024 feature release"
Bad: "change"

Include:
- What changed
- Why it changed
- Expected impact
- Test results

Version Control Integration

Treat pipelines like code:

Git commit: "Update embedding to 3-large for better quality"
Pipeline version: 1.2.0
Pipeline run: run-2024-01-15-093045

Link in commit message:
"Deployed to production - see run run-2024-01-15-093045"

Testing Before Deployment

Test Plan:

Deploy to development
Run with test dataset
Validate results
Compare metrics
Get approval
Deploy to staging
Monitor for 24 hours
Deploy to production

Change Evaluation Criteria

Monitor after deployment:

Error rate should stay <1%
Performance should match expectations
Data quality validated
No critical failures

If issues detected:

Export current (problematic) configuration for analysis
Manually revert to previous working configuration
Investigate root cause
Test fix in development before redeploying

Troubleshooting

Configuration Update Issues

Problem: Pipeline configuration changes not applied

Solutions:

Verify you have appropriate permissions
Check for validation errors in UI
Review API response for errors
Ensure pipeline not currently running

Pipeline Run References Missing

Problem: Cannot find specific pipeline run details

Solutions:

Check run retention period
Verify run ID is correct
Check permissions to view runs
Confirm run completed successfully

Configuration Export/Import Fails

Problem: Cannot export or import pipeline configuration

Solutions:

Verify permissions
Check JSON format validity
Review instance-specific references
Ensure target environment has required dependencies

Version Conflicts

Problem: Configuration version mismatch

Solutions:

Increment version number appropriately
Use different naming convention
Check for concurrent modifications
Review version control system
Check for duplicates
Review version control

Table of Contents

Pipeline Versioning and Snapshots

Overview

Understanding Pipeline Snapshots

What is a Snapshot?

Why Snapshots Exist

When Snapshots Are Created

Snapshot Contents

Snapshot Lifecycle

Automatic Creation

No User Management Required

Viewing Pipeline Run Snapshots

Accessing Snapshot Information

Comparing Pipeline Configurations

Pipeline Versioning Best Practices

Managing Configuration Changes

Reverting Configuration Changes

Pipeline Configuration Versioning

Semantic Versioning

Version Lifecycle

Version Documentation

Environment Migration

Export Pipeline

Import Pipeline

Migration Checklist

Environment-Specific Changes

Change Management

Change Request Process

Change History

Best Practices

Configuration Management Strategy

Version Control Integration

Testing Before Deployment

Change Evaluation Criteria

Troubleshooting

Configuration Update Issues

Pipeline Run References Missing

Configuration Export/Import Fails

Version Conflicts

Related Topics