Table of Contents

Class TryAGITokenizer

Namespace
FoundationaLLM.Common.Services.Tokenizers
Assembly
FoundationaLLM.Common.dll

A tokenizer service that uses the Tiktoken library to encode and decode text into tokens.

public class TryAGITokenizer : ITokenizerService
Inheritance
TryAGITokenizer
Implements
Inherited Members
Extension Methods

Methods

CountTokens(string, string?)

Count the number of tokens in a given text.

public long CountTokens(string text, string? encoderName = null)

Parameters

text string

The text to evaluate.

encoderName string

The name of the encoder used for tokenization.

Returns

long

The number of tokens in the text.

Decode(int[], string?)

Decode an array of integer token ids.

public string Decode(int[] tokens, string? encoderName = null)

Parameters

tokens int[]

An array of integer token ids.

encoderName string

The name of the encoder used for tokenization.

Returns

string

Decoded text.

Encode(string, string?)

Encode a string with a set of allowed special tokens that are not broken apart.

public List<int> Encode(string text, string? encoderName = null)

Parameters

text string

String to be encoded.

encoderName string

The name of the encoder used for tokenization.

Returns

List<int>

List of token ids.