Table of Contents

Interface ITokenizerService

Namespace
FoundationaLLM.Common.Interfaces
Assembly
FoundationaLLM.Common.dll

Represents a text tokenizer.

public interface ITokenizerService
Extension Methods

Methods

CountTokens(string, string?)

Count the number of tokens in a given text.

long CountTokens(string text, string? encoderName = null)

Parameters

text string

The text to evaluate.

encoderName string

The name of the encoder used for tokenization.

Returns

long

The number of tokens in the text.

Decode(int[], string?)

Decode an array of integer token ids.

string Decode(int[] tokens, string? encoderName = null)

Parameters

tokens int[]

An array of integer token ids.

encoderName string

The name of the encoder used for tokenization.

Returns

string

Decoded text.

Encode(string, string?)

Encode a string with a set of allowed special tokens that are not broken apart.

List<int> Encode(string text, string? encoderName = null)

Parameters

text string

String to be encoded.

encoderName string

The name of the encoder used for tokenization.

Returns

List<int>

List of token ids.