Class MicrosoftMLTokenizer
- Namespace
 - FoundationaLLM.Common.Services.Tokenizers
 
- Assembly
 - FoundationaLLM.Common.dll
 
Represents a tokenizer service that uses the Microsoft ML Tokenizer.
public class MicrosoftMLTokenizer : ITokenizerService
  - Inheritance
 - 
      
      MicrosoftMLTokenizer
 
- Implements
 
- Inherited Members
 
- Extension Methods
 
Properties
Tokenizer
Gets the underlying tokenizer instance.
public Tokenizer Tokenizer { get; }
  Property Value
Methods
CountTokens(string, string?)
Count the number of tokens in a given text.
public long CountTokens(string text, string? encoderName = null)
  Parameters
Returns
- long
 The number of tokens in the text.
Decode(int[], string?)
Decode an array of integer token ids.
public string Decode(int[] tokens, string? encoderName)
  Parameters
tokensint[]An array of integer token ids.
encoderNamestringThe name of the encoder used for tokenization.
Returns
- string
 Decoded text.
Encode(string, string?)
Encode a string with a set of allowed special tokens that are not broken apart.
public List<int> Encode(string text, string? encoderName)