Table of Contents

Setting Up Triggers

Learn how to configure triggers to automatically or manually execute data pipelines.

Overview

Triggers determine when and how data pipelines execute. Triggers are part of the data pipeline definition. FoundationaLLM currently supports two types of triggers:

  • Schedule: Run pipelines on a recurring schedule (cron-based) - ✅ Available
  • Manual: Run pipelines on-demand through the UI or API - ✅ Available
  • Event: Run pipelines in response to data source events - ⚠️ Not Currently Available (planned for future release)

Note: Triggers are defined as part of the data pipeline configuration, not as separate resources.

Currently Available Trigger Types

Schedule Triggers

Execute pipelines automatically on a recurring schedule.

Use Cases:

  • Daily data refreshes
  • Nightly batch processing
  • Weekly full reindexing
  • Regular data synchronization

Configuration:

Field Type Description Example
name string Trigger identifier "Daily Refresh"
trigger_type string Must be "Schedule" "Schedule"
trigger_cron_schedule string Cron expression "0 6 * * *"
parameter_values object Parameter overrides See below

Cron Expression Format:

┌───────────── minute (0 - 59)
│ ┌───────────── hour (0 - 23)
│ │ ┌───────────── day of month (1 - 31)
│ │ │ ┌───────────── month (1 - 12)
│ │ │ │ ┌───────────── day of week (0 - 6, Sunday = 0)
│ │ │ │ │
│ │ │ │ │
* * * * *

Common Cron Patterns:

Pattern Description Example
0 6 * * * Daily at 6:00 AM Daily refresh
0 */6 * * * Every 6 hours Frequent updates
0 0 * * 0 Weekly on Sunday midnight Weekly full sync
0 2 1 * * Monthly on 1st at 2:00 AM Monthly refresh
0 22 * * 1-5 Weekdays at 10:00 PM Business day processing

Example Schedule Trigger:

{
  "name": "Daily Morning Refresh",
  "trigger_type": "Schedule",
  "trigger_cron_schedule": "0 6 * * *",
  "parameter_values": {
    "DataSource.MyDataLake.Folders": ["data/documents"],
    "Stage.Partition.PartitioningStrategy": "Token",
    "Stage.Embed.EmbeddingModel": "text-embedding-3-large"
  }
}

Manual Triggers

Execute pipelines on-demand through user action or API call.

Use Cases:

  • Ad-hoc data processing
  • Testing configurations
  • Initial data loads
  • On-demand updates

Configuration:

Field Type Description
name string Trigger identifier
trigger_type string Must be "Manual"
parameter_values object Default parameters (can be overridden)

Invocation Methods:

  1. Management Portal "Run" button
  2. REST API endpoint
  3. SDK method call

Example Manual Trigger:

{
  "name": "Manual Full Refresh",
  "trigger_type": "Manual",
  "parameter_values": {
    "DataSource.DataLake.Folders": ["all-documents"],
    "Stage.Partition.PartitionSizeTokens": 500
  }
}

Future Feature: Event Triggers

⚠️ Not Currently Available: Event-based triggers are planned for a future release but are not yet implemented in the current version of FoundationaLLM.

Planned Use Cases:

  • Process files as soon as they're added
  • Respond to data updates
  • Real-time data ingestion
  • Incremental processing

Planned Configuration (for future reference):

Field Type Description
name string Trigger identifier
trigger_type string Will be "Event"
parameter_values object Parameter overrides

Future Event Types (planned):

  • File Added: New files in storage
  • File Modified: Existing files updated
  • File Deleted: Files removed (for cleanup)

Note: When event triggers become available, they will require data source plugin support. Not all data sources will necessarily support event-based triggering.

Parameter Values

Triggers must provide parameter values for all required pipeline parameters. These parameters are defined by the data pipeline plugins (see PluginPackageManager.cs for complete parameter definitions).

Triggers must provide parameter values for all required pipeline parameters.

Parameter Naming Convention

Parameters use a hierarchical naming structure:

Data Source Parameters:

DataSource.{DataSourceName}.{ParameterName}

Stage Parameters:

Stage.{StageName}.{ParameterName}

Stage Dependency Parameters:

Stage.{StageName}.Dependency.{DependencyPluginName}.{ParameterName}

Example Parameter Values

{
  "parameter_values": {
    // Data source parameters
    "DataSource.VGDataLake.Folders": [
      "vectorization-input/Documents"
    ],
    
    // Stage parameters
    "Stage.Partition.PartitioningStrategy": "Token",
    "Stage.Embed.EmbeddingModel": "text-embedding-3-large",
    "Stage.Embed.EmbeddingDimensions": 2048,
    "Stage.Index.IndexName": "documents-index",
    "Stage.Index.IndexPartitionName": "Documents",
    
    // Dependency parameters
    "Stage.Partition.Dependency.TokenContentTextPartitioning.PartitionSizeTokens": 400,
    "Stage.Partition.Dependency.TokenContentTextPartitioning.PartitionOverlapTokens": 100
  }
}

Required vs. Optional Parameters

Schedule and Event Triggers:

  • Must provide ALL required parameters
  • No user interaction possible during execution
  • Missing parameters cause immediate failure

Manual Triggers:

  • Can provide default values
  • User may be prompted for missing values
  • More flexibility during invocation

Creating Triggers

In the Management Portal

  1. Navigate to Data Pipelines
  2. Edit the pipeline
  3. Scroll to the Triggers section
  4. Click Add Trigger
  5. Select trigger type
  6. Configure parameters
  7. Save the pipeline

Via API

POST /instances/{instanceId}/providers/FoundationaLLM.DataPipeline/dataPipelines
Content-Type: application/json

{
  "name": "my-pipeline",
  "triggers": [
    {
      "name": "Scheduled Trigger",
      "trigger_type": "Schedule",
      "trigger_cron_schedule": "0 6 * * *",
      "parameter_values": { ... }
    }
  ]
}

Managing Multiple Triggers

A single pipeline can have multiple triggers.

Common Pattern: Combine scheduled and manual triggers

{
  "triggers": [
    {
      "name": "Daily Scheduled Run",
      "trigger_type": "Schedule",
      "trigger_cron_schedule": "0 6 * * *",
      "parameter_values": { ... }
    },
    {
      "name": "On-Demand Full Refresh",
      "trigger_type": "Manual",
      "parameter_values": { ... }
    }
  ]
}

Benefits:

  • Regular automated processing
  • Flexibility for immediate updates
  • Different parameter sets for different scenarios

Best Practices

Schedule Triggers

Timing:

  • Schedule during low-usage periods
  • Avoid peak business hours
  • Consider time zones

Frequency:

  • Balance freshness vs. resource usage
  • More frequent = higher costs
  • Less frequent = stale data

Overlap Prevention:

  • Ensure previous run completes before next starts
  • Use appropriate schedule intervals
  • Monitor run durations

Manual Triggers

Default Values:

  • Provide sensible defaults
  • Document expected parameters
  • Test common invocation patterns

Permissions:

  • Control who can trigger manually
  • Use RBAC appropriately
  • Audit manual executions

Monitoring Triggers

Schedule Execution

Check that scheduled runs execute on time:

  1. Navigate to Data Pipeline Runs
  2. Filter by pipeline name
  3. Review execution times
  4. Verify against schedule

Trigger History

Track trigger performance:

  • Success rate per trigger
  • Average execution time
  • Failure patterns
  • Resource utilization

Troubleshooting

Schedule Trigger Not Firing

Possible Causes:

  • Pipeline not active
  • Invalid cron expression
  • Time zone mismatch
  • System maintenance

Solutions:

  • Verify pipeline is active
  • Test cron expression
  • Check system status
  • Review trigger configuration

Manual Trigger Fails

Possible Causes:

  • Missing required parameters
  • Invalid parameter values
  • Insufficient permissions
  • Pipeline already running

Solutions:

  • Review parameter requirements
  • Validate parameter values
  • Check user permissions
  • Verify no active runs

Parameter Errors

Problem: "Missing required parameter" error

Solution:

  1. Review pipeline configuration
  2. Identify required parameters
  3. Add to trigger's parameter_values
  4. Verify parameter names match exactly

Problem: "Invalid parameter value" error

Solution:

  1. Check parameter type (string, int, etc.)
  2. Verify value format
  3. Confirm resource references exist
  4. Test with known-good values

Common Scenarios

Scenario 1: Daily Full Refresh

{
  "name": "Daily Full Refresh",
  "trigger_type": "Schedule",
  "trigger_cron_schedule": "0 2 * * *",
  "parameter_values": {
    "DataSource.Storage.Folders": ["/all-data"],
    "Stage.Partition.PartitionSizeTokens": 400
  }
}

Scenario 2: Hourly Incremental Update

{
  "name": "Hourly Incremental",
  "trigger_type": "Schedule",
  "trigger_cron_schedule": "0 * * * *",
  "parameter_values": {
    "DataSource.Storage.Folders": ["/recent"],
    "Stage.Index.IndexPartitionName": "incremental"
  }
}

Scenario 3: Business Hours Processing

{
  "name": "Business Hours Processing",
  "trigger_type": "Schedule",
  "trigger_cron_schedule": "0 9-17 * * 1-5",
  "parameter_values": {
    "DataSource.SharePoint.DocumentLibraries": ["Active Documents"]
  }
}

Scenario 4: On-Demand Testing

{
  "name": "Manual Test Run",
  "trigger_type": "Manual",
  "parameter_values": {
    "DataSource.Storage.Folders": ["/test-data"],
    "Stage.Embed.EmbeddingModel": "text-embedding-3-small"
  }
}

Note: Event-based triggers (Scenario 4 in previous documentation) are not currently available. For near-real-time processing, consider using frequent scheduled triggers (e.g., every 15 minutes).