Knowledge management agent

The FoundationaLLM (FLLM) Knowledge Management agent type supports the following scenarios:

  • With an Inline Context: Knowledge Management agents with an Inline Context pass the user's prompt directly to the Large Language Model (LLM).
  • Without an Inline Context: Knowledge Management agents without an Inline Context implement the Retrieval Augmented Generation (RAG) design pattern. RAG augments the user prompt with additional context to generate a more accurate response. The RAG flow uses a retrieval model to retrieve relevant documents from a knowledge base, such as a vector store, and then uses the retrieved documents to augment the user prompt before sending it to the LLM.
    • The creation of a Knowledge Management agent without an Inline Context requires an existing knowledge base, such as a vector store. Use the Vectorization API to create a vector store prior to the creation of the agent.

Knowledge Management Agent Configuration

The Knowledge Management agent configuration may reference the following resources:

  • Vectorization text embedding profile: The text embedding profile contains the configuration of the text embedding model used to embed the user prompt and perform a vector search in the knowledge base. This must match the text embedding profile used to populate the knowledge base.

  • Vectorization indexing profile: The indexing profile contains the configuration of the service hosting the index.

  • Prompt: The system prompt of the agent, describing the persona of the agent.

Note: The Knowledge Management agent implementation currently supports the AzureAISearchIndexer indexing profile.

The structure of a Knowledge Management agent is the following:

  "type": "knowledge-management",
  "name": "<name>",
  "object_id": "/instances/<instance_id>/providers/FoundationaLLM.Agent/agents/<name>",
  "description": "<description>",
  "display_name": "<display_name>",
  "inline_context": true,
  "vectorization": {
    "dedicated_pipeline": "",
    "data_source_object_id": "<data_source_object_id>",
    "indexing_profile_object_id": "<indexing_profile_object_id>",
    "text_embedding_profile_object_id": "<text_embedding_profile_object_id>",
    "text_partitioning_profile_object_id": "<text_partitioning_profile_object_id>",
    "vectorization_data_pipeline_object_id": "",
    "trigger_type": "",
    "trigger_cron_schedule": ""
  "prompt_object_id": "<prompt_resource_objectid>",
  "language_model": {
    "type": "openai",
    "provider": "microsoft",
    "temperature": 0.0,
    "use_chat": true,
    "api_endpoint": "FoundationaLLM:AzureOpenAI:API:Endpoint",
    "api_key": "FoundationaLLM:AzureOpenAI:API:Key",
    "api_version": "FoundationaLLM:AzureOpenAI:API:Version",
    "version": "FoundationaLLM:AzureOpenAI:API:Completions:ModelVersion",
    "deployment": "FoundationaLLM:AzureOpenAI:API:Completions:DeploymentName"
  "sessions_enabled": true,
  "conversation_history": {
    "enabled": true,
    "max_history": 5
  "gatekeeper": {
    "use_system_setting": false,
    "options": [
  "orchestration_settings": {
    "orchestrator": "LangChain",
    "endpoint_configuration": {
      "endpoint": "",
      "api_version": "",
      "api_key": "",
      "auth_type": "",
      "api_key": "",
      "provider": "",
      "operation_type": "chat"
    "model_parameters": {
      "deployment_name": ""


  • <name> is the name of the agent.
  • <instance_id> is the instance ID of the deployment.
  • <description> is the description of the agent. Ensure that this description details the purpose of the agent.
  • <display_name> controls the title of the agent in the Chat UI dropdown menu.
  • <data_source_object_id> is the object ID of the Data Source resource.
  • <indexing_profile_object_id> is the object ID of the indexing profile resource.
  • <text_embedding_profile_object_id> is the object ID of the text embedding profile resource.
  • <text_partitioning_profile_object_id> is the object ID of the text partitioning profile resource.
  • <prompt_resource_objectid> is the object ID of the prompt resource.
Parameter Description
type The type of the agent - will always be knowledge-management. type must be the first key in the request body.
name The name of the agent.
object_id The object ID of the agent. Remove this element when creating an agent as this is generated by the Management API.
description The description of the agent, ensure this description details the purpose of the agent.
display_name The title of the agent in the Chat UI dropdown menu. This field is optional.
inline_context Whether or not the agent has an Inline Context.
vectorization The vectorization object is only required for Knowledge Management agents without an Inline Context (inline_context is false). If the vectorization object is included, the indexing_profile_object_id and text_embedding_profile_object_id keys are required.
vectorization.dedicated_pipeline A boolean indicating whether or not the agent has a dedicated Vectorization pipeline (implemented in an upcoming release).
vectorization.data_source_object_id The object ID of the Data Source resource.
vectorization.indexing_profile_object_id The object ID of the indexing profile resource.
vectorization.text_embedding_profile_object_id The object ID of the text embedding profile resource.
vectorization.text_partitioning_profile_object_id The object ID of the text partitioning profile resource.
vectorization.vectorization_data_pipeline_object_id The resource ID of the agent's Vectorization pipeline (implemented in an upcoming release).
vectorization.trigger_type The trigger type of the agent's Vectorization pipeline (implemented in an upcoming release). Permissible values are Manual, Schedule, and Event.
vectorization.trigger_cron_schedule The schedule of the trigger in Cron format (implemented in an upcoming release). This property is valid only when trigger_type is Schedule.
prompt_object_id The object ID of the prompt resource.
language_model The language model configuration. The language_model object has been deprecated as of release 0.6.0.
language_model.type The type of the language model. Currently supporting OpenAI based langauge models.
language_model.provider The provider of the language model. Currently supporting microsoft or openai.
language_model.temperature The temperature value for the language model. A value between 0 and 1. Values closer to 0 return more factual information whereas values closer to 1 yield more creative responses.
language_model.use_chat Determines the type of language model to use, as an example, when using Microsoft's Azure OpenAI, specifying use_chat equal to true will use the AzureChatOpenAI model vs. the AzureOpenAI model in LangChain.
language_model.api_endpoint The configuration setting key that houses the API endpoint of the language model. The example above uses default FLLM values. Ensure this value is populated in application configuration.
language_model.api_key The configuration setting key that houses a reference to a key vault value containing the API key for the language model service. Ensure these values are populated in key vault and app configuration.
language_model.api_version The configuration setting key that houses the API version of the language model. The example above uses default FLLM values. Ensure this value is populated in application configuration.
language_model.version The configuration setting key that houses the version of the language model deployment. The example above uses default FLLM values. Ensure this value is populated in application configuration.
language_model.deployment The configuration setting key that houses the name given to the deployed language model. The example above uses default FLLM values. Ensure this value is populated in application configuration.
sessions_enabled A boolean value that indicates whether the agent is session-less (false) or supports sessions(true).
conversation_history The conversation history configuration.
conversation_history.enabled Indicates if conversation history is retained for subsequent agent interactions(true).
conversation_history.max_history indicates the number of messages to be retained.
gatekeeper The gatekeeper configuration.
gatekeeper.use_system_setting Indicates if the system settings are used for the gatekeeper.
gatekeeper.options Contains the list of gatekeeper options. The sample provided overrides the system setting for gatekeeper and enables Azure Content Safety and MS Presidio in the messaging pipeline.
orchestration_settings The settings for the agent orchestrator.
orchestration_settings.orchestrator FoundationaLLM currently supports LangChain and SemanticKernel for both types of Knowledge Management agents; however, Knowledge Management agents with an Inline Context can also use the AzureOpenAIDirect and AzureAIDirect orchestrators.
orchestration_settings.endpoint_configuration The endpoint configuration of the hosted LLM. FoundationaLLM currently supports Azure OpenAI and OpenAI.
orchestration_settings.endpoint_configuration.endpoint The endpoint URL of the hosted LLM. The URL should be provided directly for the LangChain or SemanticKernel orchestrators; it should be provided as an Azure App Configuration key reference for the AzureOpenAIDirect or AzureAIDirect orchestrators.
orchestration_settings.endpoint_configuration.api_version The API version of the hosted LLM. For Azure OpenAI, this value should be set to the latest GA version. The API version should be provided directly for the LangChain or SemanticKernel orchestrators; it should be provided as an Azure App Configuration key reference for the AzureOpenAIDirect or AzureAIDirect orchestrators.
orchestration_settings.endpoint_configuration.auth_type The authentication method of the hosted LLM. This value can either be token or key. For Azure OpenAI deployments, this value should be token, which configures the orchestrator to use Managed Identities for authentication. key-based authentication uses API keys.
orchestration_settings.endpoint_configuration.api_key The name of the Azure App Configuration key storing the LLM endpoint API key. This parameter is required if auth_type is set to key.
orchestration_settings.endpoint_configuration.provider The provider of the hosted LLM. FoundationaLLM currently supports microsoft (Azure OpenAI) or openai.
orchestration_settings.endpoint_configuration.operation_type This field is set to chat by default and can be omitted.
orchestration_settings.model_parameters Endpoint-specific model parameters. This field must be non-null if the provider is microsoft.
orchestration_settings.model_parameters.deployment_name This field should be set to the name of the Azure OpenAI model deployment if the provider is microsoft.

AzureOpenAIDirect Orchestrator

The AzureOpenAIDirect orchestrator passes the user's prompt to an LLM deployed in an instance of Azure OpenAI Service, bypassing LangChain or Semantic Kernel.

Example Configuration:

  "orchestration_settings": {
    "orchestrator": "AzureOpenAIDirect",
    "endpoint_configuration": {
      "endpoint": "FoundationaLLM:AzureOpenAI:API:Endpoint",
      "api_version": "FoundationaLLM:AzureOpenAI:API:Version",
      "auth_type": "key",
      "api_key": "FoundationaLLM:AzureOpenAI:API:Key",
      "operation_type": "chat"
    "model_parameters": {
      "deployment_name": "completions"

Note: AzureOpenAIDirect is only compatible with Knowledge Management agents with an Inline Context.

AzureAIDirect Orchestrator

The AzureAIDirect orchestrator passes the user's prompt to an LLM deployed as an Azure AI Studio real-time endpoint. This orchestrator allows customers to use a wider range of LLMs with FLLM agents.

Example Configuration:

  "orchestration_settings": {
    "orchestrator": "AzureAIDirect",
    "endpoint_configuration": {
      "endpoint": "<AZURE APP CONFIGURATION KEY>",
      "api_key": "<AZURE APP CONFIGURATION KEY>"
    "model_parameters": {
      "temperature": 0.8,
      "max_new_tokens": 1000,
      "deployment_name": "<AZURE AI STUDIO DEPLOYMENT NAME>"

Note: AzureAIDirect is only compatible with Knowledge Management agents with an Inline Context.

Managing Knowledge Management Agents

This section describes how to manage knowledge management agents using the Management API. {{baseUrl}} is the base URL of the Management API. {{instanceId}} is the unique identifier of the FLLM instance.


HTTP GET {{baseUrl}}/instances/{{instanceId}}/providers/FoundationaLLM.Agent/agents

Create or update

HTTP POST {{baseUrl}}/instances/{{instanceId}}/providers/FoundationaLLM.Agent/agents/<name>
Content-Type: application/json


where <agent_configuration> is the JSON agent configuration structure described above.


HTTP DELETE {{baseUrl}}/instances/{{instanceId}}/providers/FoundationaLLM.Agent/agents/<name>

FLLM currently implements logical deletes for Knowledge Management agents. This means that users cannot create a Knowledge Management agent with the same name as a deleted Knowledge Management agent. Support for purging Knowledge Management agents will be added in a future release.

Validating a Knowledge Management Agent

Once configured, the knowledge management agent can be validated using an API call to the Core API or via the User Portal.


It can take up to 5 minutes for a new Knowledge Management agent to appear in the User Portal or be accessible for requests from the Core API.

Overriding agent parameters

The agent parameters can be overridden at the time of the API call. Refer to the Core API documentation for more information.