Table of Contents

Class MicrosoftBPETokenizerService

Namespace
FoundationaLLM.Common.Services.Tokenizers
Assembly
FoundationaLLM.Common.dll

Implements an ITokenizerService using the Microsoft BPE tokenizer (https://github.com/microsoft/Tokenizer). This class should always be instantiated as a singleton when used in dependency injection scenarios.

public class MicrosoftBPETokenizerService : ITokenizerService
Inheritance
MicrosoftBPETokenizerService
Implements
Inherited Members
Extension Methods

Constructors

MicrosoftBPETokenizerService(ILogger<MicrosoftBPETokenizerService>)

Initializes a new instance of the MicrosoftBPETokenizerService class.

public MicrosoftBPETokenizerService(ILogger<MicrosoftBPETokenizerService> logger)

Parameters

logger ILogger<MicrosoftBPETokenizerService>

The logger used for logging.

Methods

Decode(int[], string)

Decode an array of integer token ids.

public string Decode(int[] tokens, string encoderName)

Parameters

tokens int[]

An array of integer token ids.

encoderName string

The name of the encoder used for tokenization.

Returns

string

Decoded text.

Encode(string, string)

Encode a string with a set of allowed special tokens that are not broken apart.

public List<int> Encode(string text, string encoderName)

Parameters

text string

String to be encoded.

encoderName string

The name of the encoder used for tokenization.

Returns

List<int>

List of token ids.