Table of Contents

Class ModelDeploymentContext

Namespace
FoundationaLLM.Gateway.Services
Assembly
FoundationaLLM.Gateway.dll

Provides context associated with model deployment used in text operations (embeddings or completions).

public class ModelDeploymentContext
Inheritance
ModelDeploymentContext
Inherited Members
Extension Methods

Constructors

ModelDeploymentContext(AzureOpenAIAccountDeployment, double, ITextOperationService, ILoggerFactory, GatewayMetrics)

Provides context associated with model deployment used in text operations (embeddings or completions).

public ModelDeploymentContext(AzureOpenAIAccountDeployment deployment, double tokenRateLimitMultiplier, ITextOperationService textOperationService, ILoggerFactory loggerFactory, GatewayMetrics metrics)

Parameters

deployment AzureOpenAIAccountDeployment

The AzureOpenAIAccountDeployment object with the details of the model deployment.

tokenRateLimitMultiplier double

The token rate limit multiplier used to account for the tokenization differences between the Gateway API and the deployed model.

textOperationService ITextOperationService

The service providing the implementation of the text operation.

loggerFactory ILoggerFactory

The ILoggerFactory used to create loggers for logging.

metrics GatewayMetrics

The FoundationaLLM Gateway telemetry metrics.

Fields

_deployment

protected readonly AzureOpenAIAccountDeployment _deployment

Field Value

AzureOpenAIAccountDeployment

_effectiveRequestRateLimit

protected readonly int _effectiveRequestRateLimit

Field Value

int

_effectiveTokenRateLimit

protected readonly int _effectiveTokenRateLimit

Field Value

int

_embeddingDimensionsIndexMapping

Embedding operations are grouped by the number of dimensions they require. For each embedding dimension, we send a single request to the model. This dictionary maps the number of dimensions to the index in the _textOperationRequests list.

protected readonly Dictionary<int, int> _embeddingDimensionsIndexMapping

Field Value

Dictionary<int, int>

_jsonSerializerOptions

protected readonly JsonSerializerOptions _jsonSerializerOptions

Field Value

JsonSerializerOptions

_logger

protected readonly ILogger<ModelDeploymentContext> _logger

Field Value

ILogger<ModelDeploymentContext>

_loggerFactory

protected readonly ILoggerFactory _loggerFactory

Field Value

ILoggerFactory

_metrics

protected readonly GatewayMetrics _metrics

Field Value

GatewayMetrics

_requestRateWindowActualRequestCount

The actual cummulated number of requests for the current request rate window.

protected int _requestRateWindowActualRequestCount

Field Value

int

_requestRateWindowProjectedRequestCount

The projected cummulated number of requests for the current request rate window.

protected int _requestRateWindowProjectedRequestCount

Field Value

int

_requestRateWindowStart

The start timestamp of the current request rate window.

protected DateTime _requestRateWindowStart

Field Value

DateTime

_textOperationRequests

protected readonly List<InternalTextOperationRequest> _textOperationRequests

Field Value

List<InternalTextOperationRequest>

_textOperationService

protected readonly ITextOperationService _textOperationService

Field Value

ITextOperationService

_tokenRateLimitMultiplier

protected readonly double _tokenRateLimitMultiplier

Field Value

double

_tokenRateWindowActualTokenCount

The actual cummulated number of tokens for the current token rate window.

protected int _tokenRateWindowActualTokenCount

Field Value

int

_tokenRateWindowProjectedTokenCount

The projects cummulated number of tokens for the current token rate window.

protected int _tokenRateWindowProjectedTokenCount

Field Value

int

_tokenRateWindowStart

The start timestamp of the current token rate window.

protected DateTime _tokenRateWindowStart

Field Value

DateTime

Properties

HasInput

public bool HasInput { get; }

Property Value

bool

ModelCanDoCompletions

Indicates whether the model in the deployment can perform completions.

public bool ModelCanDoCompletions { get; }

Property Value

bool

ModelCanDoEmbeddings

Indicates whether the model in the deployment can perform embeddings.

public bool ModelCanDoEmbeddings { get; }

Property Value

bool

Methods

ProcessTextOperationRequests()

public Task<List<InternalTextOperationResult>> ProcessTextOperationRequests()

Returns

Task<List<InternalTextOperationResult>>

TryAddInputTextChunk(TextChunk, Dictionary<string, object>)

Attempts to add a new text chunk to the input for the text operation request.

public bool TryAddInputTextChunk(TextChunk textChunk, Dictionary<string, object> modelParameters)

Parameters

textChunk TextChunk

The text chunk to be added.

modelParameters Dictionary<string, object>

The model parameters for the text operation.

Returns

bool

true if the text chunk can be added without breaching the token and reques rate limits.

Remarks

For embedding operations, modelParameters must always contain a single property named TextOperationContextPropertyNames.EmbeddingDimensions which specifies the number of dimensions required for embedding.

For completion operations, modelParameters can contain the following parameters:

  1. TextOperationContextPropertyNames.Temperature - the completion model temperature.
  2. TextOperationContextPropertyNames.TopP - the completion model top-p value.
  3. TextOperationContextPropertyNames.MaxOutputTokenCount - the completion model max output token count.