Tiktoken.Encodings.o200k
2.0.0
See the version list below for details.
dotnet add package Tiktoken.Encodings.o200k --version 2.0.0
NuGet\Install-Package Tiktoken.Encodings.o200k -Version 2.0.0
<PackageReference Include="Tiktoken.Encodings.o200k" Version="2.0.0" />
paket add Tiktoken.Encodings.o200k --version 2.0.0
#r "nuget: Tiktoken.Encodings.o200k, 2.0.0"
// Install Tiktoken.Encodings.o200k as a Cake Addin #addin nuget:?package=Tiktoken.Encodings.o200k&version=2.0.0 // Install Tiktoken.Encodings.o200k as a Cake Tool #tool nuget:?package=Tiktoken.Encodings.o200k&version=2.0.0
Tiktoken
This implementation aims for maximum performance, especially in the token count operation.
There's also a benchmark console app here for easy tracking of this.
We will be happy to accept any PR.
Implemented encodings
o200k_base
cl100k_base
r50k_base
p50k_base
p50k_edit
Usage
using Tiktoken.Encodings;
using Tiktoken;
var encoding = new O200KBase();
var encoder = new Encoder(encoding);
var tokens = encoder.Encode("hello world"); // [15339, 1917]
var text = encoder.Decode(tokens); // hello world
var numberOfTokens = encoder.CountTokens(text); // 2
var stringTokens = encoder.Explore(text); // ["hello", " world"]
Benchmarks
You can view the reports for each version here
BenchmarkDotNet v0.13.12, macOS Sonoma 14.4.1 (23E224) [Darwin 23.4.0]
Apple M1 Pro, 1 CPU, 10 logical and 10 physical cores
.NET SDK 8.0.204
[Host] : .NET 8.0.4 (8.0.424.16909), Arm64 RyuJIT AdvSIMD
DefaultJob : .NET 8.0.4 (8.0.424.16909), Arm64 RyuJIT AdvSIMD
Method | Categories | Data | Mean | Median | Ratio | Gen0 | Gen1 | Gen2 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|
SharpTokenV2_0_1_ | CountTokens | 1. (...)57. [19866] | 632,817.1 ns | 632,257.2 ns | 1.00 | 2.9297 | - | - | 20115 B | 1.00 |
TiktokenSharpV1_0_9_ | CountTokens | 1. (...)57. [19866] | 463,840.3 ns | 458,851.3 ns | 0.74 | 64.4531 | 3.4180 | - | 404649 B | 20.12 |
TokenizerLibV1_3_3_ | CountTokens | 1. (...)57. [19866] | 801,796.0 ns | 806,271.8 ns | 1.27 | 247.0703 | 98.6328 | 0.9766 | 1547675 B | 76.94 |
Tiktoken_ | CountTokens | 1. (...)57. [19866] | 319,697.2 ns | 319,475.1 ns | 0.50 | 49.3164 | - | - | 309449 B | 15.38 |
SharpTokenV2_0_1_ | CountTokens | Hello, World! | 478.1 ns | 478.1 ns | 1.00 | 0.0401 | - | - | 256 B | 1.00 |
TiktokenSharpV1_0_9_ | CountTokens | Hello, World! | 275.2 ns | 275.1 ns | 0.58 | 0.0505 | - | - | 320 B | 1.25 |
TokenizerLibV1_3_3_ | CountTokens | Hello, World! | 498.1 ns | 497.4 ns | 1.04 | 0.2356 | - | - | 1480 B | 5.78 |
Tiktoken_ | CountTokens | Hello, World! | 212.9 ns | 212.8 ns | 0.45 | 0.0420 | - | - | 264 B | 1.03 |
SharpTokenV2_0_1_ | CountTokens | King(...)edy. [275] | 6,652.5 ns | 6,651.9 ns | 1.00 | 0.0763 | - | - | 520 B | 1.00 |
TiktokenSharpV1_0_9_ | CountTokens | King(...)edy. [275] | 4,774.2 ns | 4,781.1 ns | 0.72 | 0.8011 | - | - | 5064 B | 9.74 |
TokenizerLibV1_3_3_ | CountTokens | King(...)edy. [275] | 7,261.6 ns | 7,241.6 ns | 1.09 | 3.0899 | 0.1450 | 0.0076 | 19344 B | 37.20 |
Tiktoken_ | CountTokens | King(...)edy. [275] | 3,216.1 ns | 3,189.9 ns | 0.49 | 0.6447 | - | - | 4064 B | 7.82 |
SharpTokenV2_0_1_Encode | Encode | 1. (...)57. [19866] | 613,700.9 ns | 612,821.4 ns | 1.00 | 2.9297 | - | - | 20115 B | 1.00 |
TiktokenSharpV1_0_9_Encode | Encode | 1. (...)57. [19866] | 444,436.3 ns | 444,298.4 ns | 0.72 | 64.4531 | 3.4180 | - | 404649 B | 20.12 |
TokenizerLibV1_3_3_Encode | Encode | 1. (...)57. [19866] | 773,882.5 ns | 774,314.3 ns | 1.26 | 246.0938 | 85.9375 | - | 1547673 B | 76.94 |
Tiktoken_Encode | Encode | 1. (...)57. [19866] | 335,482.3 ns | 333,936.4 ns | 0.55 | 59.5703 | 2.4414 | - | 375601 B | 18.67 |
SharpTokenV2_0_1_Encode | Encode | Hello, World! | 443.7 ns | 436.8 ns | 1.00 | 0.0405 | - | - | 256 B | 1.00 |
TiktokenSharpV1_0_9_Encode | Encode | Hello, World! | 300.4 ns | 299.4 ns | 0.67 | 0.0505 | - | - | 320 B | 1.25 |
TokenizerLibV1_3_3_Encode | Encode | Hello, World! | 504.7 ns | 498.5 ns | 1.15 | 0.2356 | 0.0010 | - | 1480 B | 5.78 |
Tiktoken_Encode | Encode | Hello, World! | 262.4 ns | 262.6 ns | 0.58 | 0.1030 | - | - | 648 B | 2.53 |
SharpTokenV2_0_1_Encode | Encode | King(...)edy. [275] | 6,784.3 ns | 6,714.1 ns | 1.00 | 0.0763 | - | - | 520 B | 1.00 |
TiktokenSharpV1_0_9_Encode | Encode | King(...)edy. [275] | 4,691.2 ns | 4,690.7 ns | 0.69 | 0.8011 | - | - | 5064 B | 9.74 |
TokenizerLibV1_3_3_Encode | Encode | King(...)edy. [275] | 7,287.9 ns | 7,290.9 ns | 1.08 | 3.0823 | 0.1373 | - | 19344 B | 37.20 |
Tiktoken_Encode | Encode | King(...)edy. [275] | 3,606.2 ns | 3,607.4 ns | 0.53 | 0.7973 | - | - | 5024 B | 9.66 |
Support
Priority place for bugs: https://github.com/tryAGI/LangChain/issues
Priority place for ideas and general questions: https://github.com/tryAGI/LangChain/discussions
Discord: https://discord.gg/Ca2xhfBf3v
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 is compatible. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
.NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
.NET Standard | netstandard2.0 is compatible. netstandard2.1 is compatible. |
.NET Framework | net461 is compatible. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen40 was computed. tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETFramework 4.6.1
- Tiktoken.Encodings.Abstractions (>= 2.0.0)
-
.NETStandard 2.0
- Tiktoken.Encodings.Abstractions (>= 2.0.0)
-
.NETStandard 2.1
- Tiktoken.Encodings.Abstractions (>= 2.0.0)
-
net6.0
- Tiktoken.Encodings.Abstractions (>= 2.0.0)
-
net7.0
- Tiktoken.Encodings.Abstractions (>= 2.0.0)
-
net8.0
- Tiktoken.Encodings.Abstractions (>= 2.0.0)
NuGet packages (1)
Showing the top 1 NuGet packages that depend on Tiktoken.Encodings.o200k:
Package | Downloads |
---|---|
Tiktoken
The fastest tokenizer for GPT-3.5 and GPT-4 inspired by Tiktoken. |
GitHub repositories
This package is not used by any popular GitHub repositories.