Tiktoken.Encodings.Tokenizer 2.2.0

There is a newer version of this package available.
See the version list below for details.
dotnet add package Tiktoken.Encodings.Tokenizer --version 2.2.0
                    
NuGet\Install-Package Tiktoken.Encodings.Tokenizer -Version 2.2.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Tiktoken.Encodings.Tokenizer" Version="2.2.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Tiktoken.Encodings.Tokenizer" Version="2.2.0" />
                    
Directory.Packages.props
<PackageReference Include="Tiktoken.Encodings.Tokenizer" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Tiktoken.Encodings.Tokenizer --version 2.2.0
                    
#r "nuget: Tiktoken.Encodings.Tokenizer, 2.2.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Tiktoken.Encodings.Tokenizer@2.2.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Tiktoken.Encodings.Tokenizer&version=2.2.0
                    
Install as a Cake Addin
#tool nuget:?package=Tiktoken.Encodings.Tokenizer&version=2.2.0
                    
Install as a Cake Tool

Tiktoken

Nuget package dotnet License: MIT Discord

This implementation aims for maximum performance, especially in the token count operation.
There's also a benchmark console app here for easy tracking of this.
We will be happy to accept any PR.

Implemented encodings

  • o200k_base
  • cl100k_base
  • r50k_base
  • p50k_base
  • p50k_edit

Usage

using Tiktoken;

var encoder = ModelToEncoder.For("gpt-4o"); // or explicitly using new Encoder(new O200KBase())
var tokens = encoder.Encode("hello world"); // [15339, 1917]
var text = encoder.Decode(tokens); // hello world
var numberOfTokens = encoder.CountTokens(text); // 2
var stringTokens = encoder.Explore(text); // ["hello", " world"]

Load from HuggingFace tokenizer.json

The Tiktoken.Encodings.Tokenizer package enables loading any HuggingFace-format tokenizer.json file — supporting GPT-2, Llama 3, Qwen2, DeepSeek, and other BPE-based models.

using Tiktoken;
using Tiktoken.Encodings;

// From a local file
var encoding = TokenizerJsonLoader.FromFile("path/to/tokenizer.json");
var encoder = new Encoder(encoding);

// From a stream (HTTP responses, embedded resources)
using var stream = File.OpenRead("tokenizer.json");
var encoding = TokenizerJsonLoader.FromStream(stream);

// From a URL (e.g., HuggingFace Hub)
using var httpClient = new HttpClient();
var encoding = await TokenizerJsonLoader.FromUrlAsync(
    "https://huggingface.co/openai-community/gpt2/raw/main/tokenizer.json",
    httpClient,
    name: "gpt2");

// Custom regex patterns (optional — auto-detected by default)
var encoding = TokenizerJsonLoader.FromFile("tokenizer.json", patterns: myPatterns);

Supported pre-tokenizer types:

  • ByteLevel — GPT-2 and similar models
  • Split with regex pattern — direct regex-based splitting
  • Sequence[Split, ByteLevel] — Llama 3, Qwen2, DeepSeek, and other modern models

Benchmarks

You can view the reports for each version here


BenchmarkDotNet v0.15.8, macOS Tahoe 26.3.1 (25D2128) [Darwin 25.3.0]
Apple M4 Max, 1 CPU, 16 logical and 16 physical cores
.NET SDK 10.0.201
  [Host]     : .NET 10.0.5 (10.0.5, 10.0.526.15411), Arm64 RyuJIT armv8.0-a
  DefaultJob : .NET 10.0.5 (10.0.5, 10.0.526.15411), Arm64 RyuJIT armv8.0-a


Method Categories Data Mean Ratio Gen0 Gen1 Allocated Alloc Ratio
SharpTokenV2_0_3_ CountTokens 1. (...)57. [19866] 371,053.29 ns 1.00 1.9531 - 20112 B 1.00
TiktokenSharpV1_1_5_ CountTokens 1. (...)57. [19866] 251,531.04 ns 0.68 7.8125 0.4883 65968 B 3.28
MicrosoftMLTokenizerV1_0_0_ CountTokens 1. (...)57. [19866] 259,868.29 ns 0.70 - - 304 B 0.02
TokenizerLibV1_3_3_ CountTokens 1. (...)57. [19866] 502,554.08 ns 1.35 184.5703 75.1953 1547672 B 76.95
Tiktoken_ CountTokens 1. (...)57. [19866] 164,658.42 ns 0.44 - - - 0.00
SharpTokenV2_0_3_ CountTokens Hello, World! 239.14 ns 1.00 0.0305 - 256 B 1.00
TiktokenSharpV1_1_5_ CountTokens Hello, World! 170.46 ns 0.71 0.0238 - 200 B 0.78
MicrosoftMLTokenizerV1_0_0_ CountTokens Hello, World! 208.69 ns 0.87 0.0124 - 104 B 0.41
TokenizerLibV1_3_3_ CountTokens Hello, World! 316.45 ns 1.32 0.1769 0.0005 1480 B 5.78
Tiktoken_ CountTokens Hello, World! 103.31 ns 0.43 - - - 0.00
SharpTokenV2_0_3_ CountTokens King(...)edy. [275] 4,055.29 ns 1.00 0.0610 - 520 B 1.00
TiktokenSharpV1_1_5_ CountTokens King(...)edy. [275] 2,552.89 ns 0.63 0.0916 - 776 B 1.49
MicrosoftMLTokenizerV1_0_0_ CountTokens King(...)edy. [275] 2,270.94 ns 0.56 0.0114 - 104 B 0.20
TokenizerLibV1_3_3_ CountTokens King(...)edy. [275] 4,846.66 ns 1.20 2.3117 0.0992 19344 B 37.20
Tiktoken_ CountTokens King(...)edy. [275] 1,418.07 ns 0.35 0.0038 - 32 B 0.06
Tiktoken_CountTokens_String CountTokensSpan 1. (...)57. [19866] 159,648.81 ns 1.00 - - - NA
Tiktoken_CountTokens_Span CountTokensSpan 1. (...)57. [19866] 161,476.23 ns 1.01 - - - NA
Tiktoken_CountTokens_String CountTokensSpan Hello, World! 104.46 ns 1.00 - - - NA
Tiktoken_CountTokens_Span CountTokensSpan Hello, World! 104.17 ns 1.00 - - - NA
Tiktoken_CountTokens_String CountTokensSpan King(...)edy. [275] 1,383.41 ns 1.00 0.0038 - 32 B 1.00
Tiktoken_CountTokens_Span CountTokensSpan King(...)edy. [275] 1,390.51 ns 1.01 0.0038 - 32 B 1.00
Tiktoken_CountTokens_FromString CountTokensUtf8 1. (...)57. [19866] 152,452.67 ns 1.00 - - - NA
Tiktoken_CountTokens_FromUtf8 CountTokensUtf8 1. (...)57. [19866] 157,322.51 ns 1.03 - - - NA
Tiktoken_CountTokens_FromString CountTokensUtf8 Hello, World! 97.91 ns 1.00 - - - NA
Tiktoken_CountTokens_FromUtf8 CountTokensUtf8 Hello, World! 105.07 ns 1.07 - - - NA
Tiktoken_CountTokens_FromString CountTokensUtf8 King(...)edy. [275] 1,364.91 ns 1.00 0.0038 - 32 B 1.00
Tiktoken_CountTokens_FromUtf8 CountTokensUtf8 King(...)edy. [275] 1,411.43 ns 1.03 0.0038 - 32 B 1.00
Tiktoken_cl100k_CountTokens CountTokens_o200k 1. (...)57. [19866] 151,196.85 ns 1.00 - - - NA
Tiktoken_o200k_CountTokens CountTokens_o200k 1. (...)57. [19866] 176,174.81 ns 1.17 - - - NA
Tiktoken_cl100k_CountTokens CountTokens_o200k Hello, World! 98.78 ns 1.00 - - - NA
Tiktoken_o200k_CountTokens CountTokens_o200k Hello, World! 110.40 ns 1.12 - - - NA
Tiktoken_cl100k_CountTokens CountTokens_o200k King(...)edy. [275] 1,368.74 ns 1.00 0.0038 - 32 B 1.00
Tiktoken_o200k_CountTokens CountTokens_o200k King(...)edy. [275] 1,491.49 ns 1.09 0.0038 - 32 B 1.00
SharpTokenV2_0_3_Decode Decode 1. (...)57. [19866] 46,445.49 ns 1.00 14.8926 - 125232 B 1.00
TiktokenSharpV1_1_5_Decode Decode 1. (...)57. [19866] 35,023.46 ns 0.75 15.8691 2.6245 133400 B 1.07
MicrosoftMLTokenizerV1_0_0_Decode Decode 1. (...)57. [19866] 68,659.89 ns 1.48 4.6387 - 39800 B 0.32
TokenizerLibV1_3_3_Decode Decode 1. (...)57. [19866] 46,817.85 ns 1.01 28.0151 2.9297 234680 B 1.87
Tiktoken_Decode Decode 1. (...)57. [19866] 27,567.26 ns 0.59 4.7302 - 39760 B 0.32
SharpTokenV2_0_3_Decode Decode Hello, World! 60.40 ns 1.00 0.0564 - 472 B 1.00
TiktokenSharpV1_1_5_Decode Decode Hello, World! 42.18 ns 0.70 0.0105 - 88 B 0.19
MicrosoftMLTokenizerV1_0_0_Decode Decode Hello, World! 46.69 ns 0.77 0.0105 - 88 B 0.19
TokenizerLibV1_3_3_Decode Decode Hello, World! 46.06 ns 0.76 0.0344 - 288 B 0.61
Tiktoken_Decode Decode Hello, World! 22.35 ns 0.37 0.0057 - 48 B 0.10
SharpTokenV2_0_3_Decode Decode King(...)edy. [275] 552.07 ns 1.00 0.2146 - 1800 B 1.00
TiktokenSharpV1_1_5_Decode Decode King(...)edy. [275] 455.47 ns 0.83 0.0734 - 616 B 0.34
MicrosoftMLTokenizerV1_0_0_Decode Decode King(...)edy. [275] 561.06 ns 1.02 0.0734 - 616 B 0.34
TokenizerLibV1_3_3_Decode Decode King(...)edy. [275] 442.90 ns 0.80 0.3901 0.0005 3264 B 1.81
Tiktoken_Decode Decode King(...)edy. [275] 238.58 ns 0.43 0.0687 - 576 B 0.32
Tiktoken_Decode_Baseline DecodeToUtf8 1. (...)57. [19866] 27,383.51 ns 1.00 4.7302 - 39760 B 1.00
Tiktoken_DecodeToUtf8 DecodeToUtf8 1. (...)57. [19866] 27,897.29 ns 1.02 - - - 0.00
Tiktoken_Decode_Baseline DecodeToUtf8 Hello, World! 22.86 ns 1.00 0.0057 - 48 B 1.00
Tiktoken_DecodeToUtf8 DecodeToUtf8 Hello, World! 14.60 ns 0.64 - - - 0.00
Tiktoken_Decode_Baseline DecodeToUtf8 King(...)edy. [275] 238.69 ns 1.00 0.0687 - 576 B 1.00
Tiktoken_DecodeToUtf8 DecodeToUtf8 King(...)edy. [275] 241.59 ns 1.01 - - - 0.00
Tiktoken_cl100k_Decode Decode_o200k 1. (...)57. [19866] 27,031.98 ns 1.00 4.7302 - 39760 B 1.00
Tiktoken_o200k_Decode Decode_o200k 1. (...)57. [19866] 27,957.89 ns 1.03 4.7302 - 39760 B 1.00
Tiktoken_cl100k_Decode Decode_o200k Hello, World! 21.63 ns 1.00 0.0057 - 48 B 1.00
Tiktoken_o200k_Decode Decode_o200k Hello, World! 22.86 ns 1.06 0.0057 - 48 B 1.00
Tiktoken_cl100k_Decode Decode_o200k King(...)edy. [275] 235.36 ns 1.00 0.0687 - 576 B 1.00
Tiktoken_o200k_Decode Decode_o200k King(...)edy. [275] 237.36 ns 1.01 0.0687 - 576 B 1.00
SharpTokenV2_0_3_Encode Encode 1. (...)57. [19866] 345,110.21 ns 1.00 1.9531 - 20112 B 1.00
TiktokenSharpV1_1_5_Encode Encode 1. (...)57. [19866] 240,838.63 ns 0.70 7.8125 0.7324 65968 B 3.28
MicrosoftMLTokenizerV1_0_0_Encode Encode 1. (...)57. [19866] 247,064.98 ns 0.72 7.8125 0.4883 66144 B 3.29
TokenizerLibV1_3_3_Encode Encode 1. (...)57. [19866] 467,095.12 ns 1.35 184.5703 75.1953 1547672 B 76.95
Tiktoken_Encode Encode 1. (...)57. [19866] 164,373.43 ns 0.48 7.8125 0.7324 65840 B 3.27
SharpTokenV2_0_3_Encode Encode Hello, World! 230.38 ns 1.00 0.0305 - 256 B 1.00
TiktokenSharpV1_1_5_Encode Encode Hello, World! 160.05 ns 0.70 0.0238 - 200 B 0.78
MicrosoftMLTokenizerV1_0_0_Encode Encode Hello, World! 202.58 ns 0.88 0.0210 - 176 B 0.69
TokenizerLibV1_3_3_Encode Encode Hello, World! 289.62 ns 1.26 0.1769 0.0005 1480 B 5.78
Tiktoken_Encode Encode Hello, World! 124.12 ns 0.54 0.0086 - 72 B 0.28
SharpTokenV2_0_3_Encode Encode King(...)edy. [275] 3,846.77 ns 1.00 0.0610 - 520 B 1.00
TiktokenSharpV1_1_5_Encode Encode King(...)edy. [275] 2,484.64 ns 0.65 0.0916 - 776 B 1.49
MicrosoftMLTokenizerV1_0_0_Encode Encode King(...)edy. [275] 2,169.79 ns 0.56 0.0877 - 752 B 1.45
TokenizerLibV1_3_3_Encode Encode King(...)edy. [275] 4,689.55 ns 1.22 2.3117 0.0992 19344 B 37.20
Tiktoken_Encode Encode King(...)edy. [275] 1,506.18 ns 0.39 0.0801 - 680 B 1.31
Tiktoken_Encode_String EncodeSpan 1. (...)57. [19866] 170,807.87 ns 1.00 7.8125 0.7324 65840 B 1.00
Tiktoken_Encode_Span EncodeSpan 1. (...)57. [19866] 166,968.48 ns 0.98 7.8125 0.7324 65840 B 1.00
Tiktoken_Encode_String EncodeSpan Hello, World! 125.93 ns 1.00 0.0086 - 72 B 1.00
Tiktoken_Encode_Span EncodeSpan Hello, World! 126.25 ns 1.00 0.0086 - 72 B 1.00
Tiktoken_Encode_String EncodeSpan King(...)edy. [275] 1,515.16 ns 1.00 0.0801 - 680 B 1.00
Tiktoken_Encode_Span EncodeSpan King(...)edy. [275] 1,493.72 ns 0.99 0.0801 - 680 B 1.00
Tiktoken_Encode_Baseline EncodeUtf8 1. (...)57. [19866] 169,092.49 ns 1.00 7.8125 0.7324 65840 B 1.00
Tiktoken_EncodeUtf8 EncodeUtf8 1. (...)57. [19866] 364,039.90 ns 2.15 7.8125 0.4883 65880 B 1.00
Tiktoken_Encode_Baseline EncodeUtf8 Hello, World! 125.21 ns 1.00 0.0086 - 72 B 1.00
Tiktoken_EncodeUtf8 EncodeUtf8 Hello, World! 268.86 ns 2.15 0.0134 - 112 B 1.56
Tiktoken_Encode_Baseline EncodeUtf8 King(...)edy. [275] 1,520.22 ns 1.00 0.0801 - 680 B 1.00
Tiktoken_EncodeUtf8 EncodeUtf8 King(...)edy. [275] 3,234.26 ns 2.13 0.0877 - 752 B 1.11
Tiktoken_cl100k_Encode Encode_o200k 1. (...)57. [19866] 172,138.88 ns 1.00 7.8125 0.7324 65840 B 1.00
Tiktoken_o200k_Encode Encode_o200k 1. (...)57. [19866] 197,942.14 ns 1.15 7.8125 0.7324 65840 B 1.00
Tiktoken_cl100k_Encode Encode_o200k Hello, World! 127.83 ns 1.00 0.0086 - 72 B 1.00
Tiktoken_o200k_Encode Encode_o200k Hello, World! 138.49 ns 1.08 0.0086 - 72 B 1.00
Tiktoken_cl100k_Encode Encode_o200k King(...)edy. [275] 1,511.29 ns 1.00 0.0801 - 680 B 1.00
Tiktoken_o200k_Encode Encode_o200k King(...)edy. [275] 1,632.33 ns 1.08 0.0801 - 680 B 1.00
Tiktoken_Explore Explore 1. (...)57. [19866] 317,515.51 ns 1.00 63.9648 20.9961 538312 B 1.00
Tiktoken_ExploreUtfSafe Explore 1. (...)57. [19866] 352,407.43 ns 1.11 83.0078 30.2734 696384 B 1.29
Tiktoken_Explore Explore Hello, World! 201.20 ns 1.00 0.0772 - 648 B 1.00
Tiktoken_ExploreUtfSafe Explore Hello, World! 222.43 ns 1.11 0.0966 0.0002 808 B 1.25
Tiktoken_Explore Explore King(...)edy. [275] 3,055.34 ns 1.00 0.8278 0.0076 6952 B 1.00
Tiktoken_ExploreUtfSafe Explore King(...)edy. [275] 3,391.64 ns 1.11 1.0910 0.0191 9128 B 1.31

Support

Priority place for bugs: https://github.com/tryAGI/LangChain/issues
Priority place for ideas and general questions: https://github.com/tryAGI/LangChain/discussions
Discord: https://discord.gg/Ca2xhfBf3v

Product Compatible and additional computed target framework versions.
.NET net5.0 was computed.  net5.0-windows was computed.  net6.0 was computed.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 is compatible.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
.NET Core netcoreapp2.0 was computed.  netcoreapp2.1 was computed.  netcoreapp2.2 was computed.  netcoreapp3.0 was computed.  netcoreapp3.1 was computed. 
.NET Standard netstandard2.0 is compatible.  netstandard2.1 is compatible. 
.NET Framework net461 was computed.  net462 is compatible.  net463 was computed.  net47 was computed.  net471 was computed.  net472 was computed.  net48 was computed.  net481 was computed. 
MonoAndroid monoandroid was computed. 
MonoMac monomac was computed. 
MonoTouch monotouch was computed. 
Tizen tizen40 was computed.  tizen60 was computed. 
Xamarin.iOS xamarinios was computed. 
Xamarin.Mac xamarinmac was computed. 
Xamarin.TVOS xamarintvos was computed. 
Xamarin.WatchOS xamarinwatchos was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on Tiktoken.Encodings.Tokenizer:

Package Downloads
Tiktoken

The fastest tokenizer for GPT-3.5 and GPT-4 inspired by Tiktoken.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
3.1.4 2,228 3/24/2026
3.1.3 352 3/24/2026
3.1.2 123 3/24/2026
3.1.2-alpha.0.1 41 3/24/2026
3.1.1 117 3/24/2026
3.1.1-alpha.0.1 39 3/24/2026
3.1.0 128 3/23/2026
3.1.0-rc.1.4 44 3/23/2026
3.1.0-rc.1.3 39 3/23/2026
3.1.0-rc.1.2 38 3/23/2026
3.1.0-rc.1.1 41 3/23/2026
3.1.0-rc.1 46 3/23/2026
3.0.1-alpha.0.4 40 3/23/2026
3.0.0 327 3/22/2026
2.2.0 83 3/18/2026