Table of Contents

Class TikTokenizerConfig

Namespace
FoundationaLLM.Common.Services.Tokenizers
Assembly
FoundationaLLM.Common.dll

Provides the configuration values required to create a new TikTokenizer instance.

public record TikTokenizerConfig : IEquatable<TikTokenizerConfig>
Inheritance
TikTokenizerConfig
Implements
Inherited Members
Extension Methods

Constructors

TikTokenizerConfig(string, string, Dictionary<string, int>)

Provides the configuration values required to create a new TikTokenizer instance.

public TikTokenizerConfig(string RegexPattern, string MergeableRanksFileUrl, Dictionary<string, int> SpecialTokens)

Parameters

RegexPattern string

Regex pattern to break a long string.

MergeableRanksFileUrl string

The URL used to download the BPE rank file.

SpecialTokens Dictionary<string, int>

Special tokens mapping.

Properties

MergeableRanksFileContent

The raw content of the BPE rank file.

public byte[]? MergeableRanksFileContent { get; set; }

Property Value

byte[]

MergeableRanksFileUrl

The URL used to download the BPE rank file.

public string MergeableRanksFileUrl { get; init; }

Property Value

string

RegexPattern

Regex pattern to break a long string.

public string RegexPattern { get; init; }

Property Value

string

SpecialTokens

Special tokens mapping.

public Dictionary<string, int> SpecialTokens { get; init; }

Property Value

Dictionary<string, int>