Table of Contents

Class MicrosoftMLTokenizer

Namespace
FoundationaLLM.Common.Services.Tokenizers
Assembly
FoundationaLLM.Common.dll

Represents a tokenizer service that uses the Microsoft ML Tokenizer.

public class MicrosoftMLTokenizer : ITokenizerService
Inheritance
MicrosoftMLTokenizer
Implements
Inherited Members
Extension Methods

Properties

Tokenizer

Gets the underlying tokenizer instance.

public Tokenizer Tokenizer { get; }

Property Value

Tokenizer

Methods

CountTokens(string, string?)

Count the number of tokens in a given text.

public long CountTokens(string text, string? encoderName = null)

Parameters

text string

The text to evaluate.

encoderName string

The name of the encoder used for tokenization.

Returns

long

The number of tokens in the text.

Decode(int[], string?)

Decode an array of integer token ids.

public string Decode(int[] tokens, string? encoderName)

Parameters

tokens int[]

An array of integer token ids.

encoderName string

The name of the encoder used for tokenization.

Returns

string

Decoded text.

Encode(string, string?)

Encode a string with a set of allowed special tokens that are not broken apart.

public List<int> Encode(string text, string? encoderName)

Parameters

text string

String to be encoded.

encoderName string

The name of the encoder used for tokenization.

Returns

List<int>

List of token ids.