Class MicrosoftBPETokenizerService
- Namespace
- FoundationaLLM.Common.Services.Tokenizers
- Assembly
- FoundationaLLM.Common.dll
Implements an ITokenizerService using the Microsoft BPE tokenizer (https://github.com/microsoft/Tokenizer). This class should always be instantiated as a singleton when used in dependency injection scenarios.
public class MicrosoftBPETokenizerService : ITokenizerService
- Inheritance
-
MicrosoftBPETokenizerService
- Implements
- Inherited Members
- Extension Methods
Constructors
MicrosoftBPETokenizerService(ILogger<MicrosoftBPETokenizerService>)
Initializes a new instance of the MicrosoftBPETokenizerService class.
public MicrosoftBPETokenizerService(ILogger<MicrosoftBPETokenizerService> logger)
Parameters
logger
ILogger<MicrosoftBPETokenizerService>The logger used for logging.
Methods
Decode(int[], string)
Decode an array of integer token ids.
public string Decode(int[] tokens, string encoderName)
Parameters
tokens
int[]An array of integer token ids.
encoderName
stringThe name of the encoder used for tokenization.
Returns
- string
Decoded text.
Encode(string, string)
Encode a string with a set of allowed special tokens that are not broken apart.
public List<int> Encode(string text, string encoderName)