Table of Contents

Pipeline Versioning and Snapshots

Learn about pipeline versioning, automatic snapshots, and configuration management.

Overview

Pipeline versioning helps you:

  • Track configuration changes over time
  • Compare pipeline versions
  • Migrate pipelines between environments
  • Maintain audit trails
  • Understand pipeline execution history

Understanding Pipeline Snapshots

What is a Snapshot?

A snapshot is an automatic, point-in-time copy of a pipeline configuration that is created when the pipeline runs. Snapshots include:

  • Pipeline definition
  • Stage configurations
  • Trigger settings
  • Parameter defaults
  • Plugin references

Important: Snapshots are automatically created by the system when a pipeline execution starts. They are not user-manageable via the Management Portal.

Why Snapshots Exist

Snapshots serve a critical purpose in data pipeline execution:

  1. Freeze Configuration: When a pipeline run starts, a snapshot freezes the pipeline definition
  2. Long-Running Stability: Since pipelines can run for extended periods (hours or days), the snapshot ensures the execution uses a consistent configuration
  3. Change Isolation: If the pipeline definition is modified while a run is in progress, the running instance continues with the original configuration from the snapshot
  4. Audit Trail: Snapshots provide a record of exactly what configuration was used for each pipeline run

When Snapshots Are Created

Automatic Creation (system-managed):

  • At the start of every pipeline execution
  • Automatically when "Run" is triggered (manual or scheduled)
  • Before the first work item is processed
  • Linked to the specific pipeline run ID

Snapshot Contents

{
  "snapshot_id": "run-2024-01-15-093045",
  "created_at": "2024-01-15T09:30:45Z",
  "run_id": "pipeline-run-12345",
  "pipeline_definition": {
    "name": "customer-data-pipeline",
    "version": "1.2.0",
    "data_source": { ... },
    "starting_stages": [ ... ],
    "triggers": [ ... ]
  }
}

Snapshot Lifecycle

Automatic Creation

User triggers pipeline run
         ↓
System creates snapshot
         ↓
Pipeline executes with snapshot
         ↓
Snapshot persists with run history

No User Management Required

Note: Unlike some systems, FoundationaLLM snapshots are not created, deleted, or managed by users. They are purely system-managed artifacts tied to pipeline runs.

Viewing Pipeline Run Snapshots

Accessing Snapshot Information

While you cannot manage snapshots directly, you can view them through pipeline run details:

  1. Navigate to Data Pipeline Runs
  2. Select a specific pipeline run
  3. View run details which include the snapshot reference
  4. See the frozen configuration that was used for that specific run
Field Description
Run ID Unique identifier for the pipeline run
Snapshot ID Associated snapshot identifier
Created Date and time of execution
Configuration The pipeline definition used (from snapshot)

Comparing Pipeline Configurations

Since snapshots are tied to runs, you can compare configurations by looking at different pipeline runs:

  1. Select two different pipeline runs
  2. View their respective configurations
  3. Manually compare the differences in:
    • Stage configurations
    • Parameter values
    • Plugin versions
    • Trigger settings

Why This Matters:

  • Helps diagnose why results differ between runs
  • Shows configuration evolution over time
  • Aids in troubleshooting issues

Pipeline Versioning Best Practices

Managing Configuration Changes

Since snapshots are automatic, focus on managing your pipeline definitions:

Version Control Approach:

  • Treat pipeline definitions like code
  • Document changes in version control (Git)
  • Use meaningful commit messages
  • Tag stable releases
  • Maintain changelog

Change Management:

  1. Test changes in development environment first
  2. Document what changed and why
  3. Validate with sample data
  4. Monitor first production runs closely
  5. Keep configuration backup (export JSON)

Reverting Configuration Changes

If you need to revert a pipeline to a previous configuration:

Manual Reversion Process:

  1. Export current configuration (for backup)
  2. Locate previous working configuration:
    • From version control system
    • From exported JSON backups
    • From previous pipeline run details
  3. Update pipeline definition manually
  4. Test thoroughly before production use

Why No Automatic Rollback:

  • Snapshots are tied to runs, not configuration management
  • Manual reversion ensures deliberate, tested changes
  • Reduces risk of accidental configuration loss

Pipeline Configuration Versioning

Semantic Versioning

Follow versioning convention for pipeline definitions:

Format: MAJOR.MINOR.PATCH
Example: 1.2.3

MAJOR: Breaking changes (1.0.0 → 2.0.0)
MINOR: New features (1.1.0 → 1.2.0)
PATCH: Bug fixes (1.2.1 → 1.2.2)

Version Lifecycle

Development → Testing → Staging → Production
    v1.0.0      v1.0.0     v1.0.0      v1.0.0

After changes:
Development → Testing → Staging → Production
    v1.1.0      v1.0.0     v1.0.0      v1.0.0

After validation:
Development → Testing → Staging → Production
    v1.1.0      v1.1.0     v1.1.0      v1.0.0

After deployment:
Development → Testing → Staging → Production
    v1.2.0      v1.1.0     v1.1.0      v1.1.0

Version Documentation

Document important versions:

Latest: Current production version
Stable: Last known good configuration
Beta: Testing new features in development
Archived: No longer in use

Environment Migration

Export Pipeline

Export configuration for migration:

GET /instances/{instanceId}/providers/FoundationaLLM.DataPipeline/dataPipelines/{pipelineName}/export

Response:

{
  "pipeline_definition": { ... },
  "dependencies": {
    "plugins": ["TextExtraction", "Embedding"],
    "data_sources": ["azure-data-lake"],
    "configurations": ["AzureAISearch"]
  }
}

Import Pipeline

Import to new environment:

POST /instances/{targetInstanceId}/providers/FoundationaLLM.DataPipeline/dataPipelines/import
Content-Type: application/json

{
  "pipeline_definition": { ... },
  "name": "customer-data-pipeline",
  "environment": "production",
  "overwrite": false
}

Migration Checklist

Before migrating:

  • [ ] Export pipeline configuration
  • [ ] Document dependencies
  • [ ] Verify plugins available in target
  • [ ] Check data source access
  • [ ] Test with sample data
  • [ ] Update instance-specific values
  • [ ] Review security settings
  • [ ] Test triggers
  • [ ] Validate integration points
  • [ ] Create rollback plan

Environment-Specific Changes

Update for target environment:

{
  "development": {
    "data_source": "dev-storage",
    "index": "dev-index",
    "schedule": null
  },
  "production": {
    "data_source": "prod-storage",
    "index": "prod-index",
    "schedule": "0 6 * * *"
  }
}

Change Management

Change Request Process

  1. Document Change:

    Change: Update embedding model
    Reason: Improve search quality
    Impact: 20% slower, better accuracy
    Rollback Plan: Export current config before change
    
  2. Export Current Configuration: Save configuration backup

  3. Test in Dev: Validate in development

  4. Review: Get approval if needed

  5. Deploy: Apply to production

  6. Validate: Confirm success

  7. Document: Record outcome and pipeline run ID

Change History

Track all changes:

Date Version Change By Snapshot
2024-01-20 1.3.0 Add safety stage alice snap-456
2024-01-15 1.2.0 Update embedding bob snap-123
2024-01-10 1.1.0 Increase chunks alice snap-789

Best Practices

Configuration Management Strategy

Change Frequency:

  • Production: Test changes thoroughly before deployment
  • Staging: Regular validation environment
  • Development: Iterate and test freely

Documentation Quality:

Good: "Updated embedding model for Q4 2024 feature release"
Bad: "change"

Include:
- What changed
- Why it changed
- Expected impact
- Test results

Version Control Integration

Treat pipelines like code:

Git commit: "Update embedding to 3-large for better quality"
Pipeline version: 1.2.0
Pipeline run: run-2024-01-15-093045

Link in commit message:
"Deployed to production - see run run-2024-01-15-093045"

Testing Before Deployment

Test Plan:

  1. Deploy to development
  2. Run with test dataset
  3. Validate results
  4. Compare metrics
  5. Get approval
  6. Deploy to staging
  7. Monitor for 24 hours
  8. Deploy to production

Change Evaluation Criteria

Monitor after deployment:

  • Error rate should stay <1%
  • Performance should match expectations
  • Data quality validated
  • No critical failures

If issues detected:

  1. Export current (problematic) configuration for analysis
  2. Manually revert to previous working configuration
  3. Investigate root cause
  4. Test fix in development before redeploying

Troubleshooting

Configuration Update Issues

Problem: Pipeline configuration changes not applied

Solutions:

  • Verify you have appropriate permissions
  • Check for validation errors in UI
  • Review API response for errors
  • Ensure pipeline not currently running

Pipeline Run References Missing

Problem: Cannot find specific pipeline run details

Solutions:

  • Check run retention period
  • Verify run ID is correct
  • Check permissions to view runs
  • Confirm run completed successfully

Configuration Export/Import Fails

Problem: Cannot export or import pipeline configuration

Solutions:

  • Verify permissions
  • Check JSON format validity
  • Review instance-specific references
  • Ensure target environment has required dependencies

Version Conflicts

Problem: Configuration version mismatch

Solutions:

  • Increment version number appropriately
  • Use different naming convention
  • Check for concurrent modifications
  • Review version control system
  • Check for duplicates
  • Review version control