Monitoring and troubleshooting vectorization
The typical steps you have to perform when monitoring and troubleshooting vectorization in FoundationaLLM (FLLM) are:
- Check the configuration of the Vectorization API and Vectorization Worker. For more details, see Configuring vectorization.
- Check the working condition of the Management API, Vectorization API and Vectorization Worker(s). Ensure the services have started and initialized successfully.
- Check the status endpoints for the Core API, Vectorization API and the Management API. You can do this by submitting a HTTP GET request to the
/status
endpoint of these APIs and validate that you get a HTTP 200 OK response with body like<api_name> - ready
. - Check the logs of the Management Vectorization API and Vectorization Worker(s) for errors. By default, the logs are written to the Azure App Insights Log Analytics Workspace deployed by FLLM.
- Check the definitions of the vectorization profiles used in the vectorization requests. For more details, see Managing vectorization profiles. Ensure all the required app configuration elements are present and have the correct values.
- Check the state of the vectorization requests. By default, the vectorization requests are stored in the
vectorization-state
container of the Azure Storage account deployed by FLLM.
State and logging of vectorization requests
All state and logging of vectorization requests are stored in the vectorization-state
container of the Azure Storage account deployed by FLLM.
Vectorization request resource files
Each vectorization request resource is stored in the vectorization-state/requests
folder. The request resources are created and managed through the Management API. The naming convention is: vectorization-state/requests/<request_id>-<yyyMMdd>.json
.
The resource file is updated as the vectorization request progresses through the processing. The resource file contains the following fields that can assist in troubleshooting:
Field | Description |
---|---|
id |
The unique identifier of the vectorization request. When looking up the subsequent execution state, this is the identifier that is used in the file name. |
content-identifier.canonical_id |
The canonical id of the vectorization request. This is the path within the execution-state folder where additional logs and associated vectorization artifacts are stored. |
processing-state |
The current state of the vectorization request, values can be New , InProgress , Completed , Failed |
error_messages |
A high level list of error messages encountered during processing. |
current_step |
The step currently being executed, or the step in which a failure occurred. |
pipeline_object_id |
When created through a vectorization pipeline, this field contains the object id of the pipeline. |
pipeline_execution_id |
When process is initiated through a vectorization pipeline, this field contains the unique identifier of the pipeline execution. |
Vectorization execution state files
The execution state of a vectorization request is stored in the vectorization-state/execution-state
folder. The naming convention is: vectorization-state/execution-state/<canonical_id>/<file_name>_state_<request_id>.json
. The execution state file provide verbose details about the request that is updated as the vectorization request is processed. This file records generated assets and logs. Error messages can be found in the log
field.
Vectorization pipeline state files
The state of the vectorization pipeline is stored in the vectorization-state/pipeline-state
folder. The naming convention is: vectorization-state/pipeline-state/<pipeline_name>/<pipeline_name>-<pipeline_execution_id>.json
. The pipeline state records associated vectorization requests that are processed together in a single pipeline in the vectorization_requests
field. The overall pipeline state is calculated based on the states of the collection of vectorization requests, this state is calculated by the following table in order:
Condition | Pipeline state |
---|---|
At least one request is InProgress |
InProgress |
All requests are Completed |
Completed |
At least one request is Failed |
Failed |
All requests are New or there are no requests being tracked. |
New |
You can use the Management API with the object id of the request to retrieve the vectorization request resource that contains a high level overview of any errors that have occurred. If more detailed information is required, then reviewing the execution state file is recommended.