Table of Contents

Interface ITokenizerService

Namespace
FoundationaLLM.Common.Interfaces
Assembly
FoundationaLLM.Common.dll

Represents a text tokenizer.

public interface ITokenizerService
Extension Methods

Methods

Decode(int[], string)

Decode an array of integer token ids.

string Decode(int[] tokens, string encoderName)

Parameters

tokens int[]

An array of integer token ids.

encoderName string

The name of the encoder used for tokenization.

Returns

string

Decoded text.

Encode(string, string)

Encode a string with a set of allowed special tokens that are not broken apart.

List<int> Encode(string text, string encoderName)

Parameters

text string

String to be encoded.

encoderName string

The name of the encoder used for tokenization.

Returns

List<int>

List of token ids.