Class MicrosoftMLTokenizer
- Namespace
- FoundationaLLM.Common.Services.Tokenizers
- Assembly
- FoundationaLLM.Common.dll
Represents a tokenizer service that uses the Microsoft ML Tokenizer.
public class MicrosoftMLTokenizer : ITokenizerService
- Inheritance
-
MicrosoftMLTokenizer
- Implements
- Inherited Members
- Extension Methods
Properties
Tokenizer
Gets the underlying tokenizer instance.
public Tokenizer Tokenizer { get; }
Property Value
Methods
CountTokens(string, string?)
Count the number of tokens in a given text.
public long CountTokens(string text, string? encoderName = null)
Parameters
Returns
- long
The number of tokens in the text.
Decode(int[], string?)
Decode an array of integer token ids.
public string Decode(int[] tokens, string? encoderName)
Parameters
tokens
int[]An array of integer token ids.
encoderName
stringThe name of the encoder used for tokenization.
Returns
- string
Decoded text.
Encode(string, string?)
Encode a string with a set of allowed special tokens that are not broken apart.
public List<int> Encode(string text, string? encoderName)