Tokenizers.DotNet
1.3.0
dotnet add package Tokenizers.DotNet --version 1.3.0
NuGet\Install-Package Tokenizers.DotNet -Version 1.3.0
<PackageReference Include="Tokenizers.DotNet" Version="1.3.0" />
<PackageVersion Include="Tokenizers.DotNet" Version="1.3.0" />
<PackageReference Include="Tokenizers.DotNet" />
paket add Tokenizers.DotNet --version 1.3.0
#r "nuget: Tokenizers.DotNet, 1.3.0"
#:package Tokenizers.DotNet@1.3.0
#addin nuget:?package=Tokenizers.DotNet&version=1.3.0
#tool nuget:?package=Tokenizers.DotNet&version=1.3.0
Tokenizers.DotNet
.NET wrapper of HuggingFace Tokenizers library
Nuget Package list
Requirements
- .NET 6 / .NET Standard 2.0 or above
- (Build) Latest Rust
Supported functionalities
- Download tokenizer files from Hugginface Hub
- Load tokenizer file(
.json) from local - Encode string to tokens
- Decode tokens to string
How to use
(1) Install the packages
- From the NuGet, install
Tokenizers.DotNetpackage - And then, install
Tokenizers.DotNet.runtime.<OS>-<ARCH>package too (e.awin-x64orlinux-arm64, check Nuget package list above).
(2) Write the code
Check following example code:
using Tokenizers.DotNet;
// Download skt/kogpt2-base-v2/tokenizer.json from the hub
var hubName = "skt/kogpt2-base-v2";
var filePath = "tokenizer.json";
var fileFullPath = await HuggingFace.GetFileFromHub(hubName, filePath, "deps");
Console.WriteLine($"Downloaded {fileFullPath}");
// Create a tokenizer instance
Tokenizer tokenizer;
try
{
tokenizer = new Tokenizer(vocabPath: fileFullPath);
}
catch (TokenizerException e)
{
Console.WriteLine(e.Message);
return;
}
try
{
var text = "음, 이제 식사도 해볼까요";
Console.WriteLine($"Input text: {text}");
var tokens = tokenizer.Encode(text);
Console.WriteLine($"Encoded: {string.Join(", ", tokens)}");
var decoded = tokenizer.Decode(tokens);
Console.WriteLine($"Decoded: {decoded}");
}
catch (TokenizerException e)
{
Console.WriteLine(e.Message);
return;
}
Console.WriteLine($"Version of Tokenizers.DotNet.runtime.win: {tokenizer.GetVersion()}");
Console.WriteLine("--------------------------------------------------");
//// Download openai-community/gpt2 from the hub
hubName = "openai-community/gpt2";
filePath = "tokenizer.json";
fileFullPath = await HuggingFace.GetFileFromHub(hubName, filePath, "deps");
// Create a tokenizer instance
Tokenizer tokenizer2;
try
{
tokenizer2 = new Tokenizer(vocabPath: fileFullPath);
}
catch (TokenizerException e)
{
Console.WriteLine(e.Message);
return;
}
try
{
var text2 = "i was nervous before the exam, and i had a fever.";
Console.WriteLine($"Input text: {text2}");
var tokens2 = tokenizer2.Encode(text2);
Console.WriteLine($"Encoded: {string.Join(", ", tokens2)}");
var decoded2 = tokenizer2.Decode(tokens2);
Console.WriteLine($"Decoded: {decoded2}");
}
catch (TokenizerException e)
{
Console.WriteLine(e.Message);
return;
}
Console.WriteLine($"Version of Tokenizers.DotNet.runtime.win: {tokenizer2.GetVersion()}");
Console.ReadKey();
How to build
(Recommended) Cross-platform build
You can use Docker to compile this library for Windows x64/arm64 and Linux x64/arm64
Run update_version.ps1 before running Docker to update the package version.
Windows:
PS > docker build -f Dockerfile -t ghcr.io/sappho192/tokenizers.dotnet:latest .
PS > docker run -v .\nuget:/out --rm ghcr.io/sappho192/tokenizers.dotnet:latest
Linux/MacOS:
$ docker build -f Dockerfile -t ghcr.io/sappho192/tokenizers.dotnet:latest .
$ docker run -v ./nuget:/out --rm ghcr.io/sappho192/tokenizers.dotnet:latest
Built packages will be in the nuget folder.
Building with local machine system
(Note that this has been confirmed only in Windows machine)
- Prepare following stuff:
- Rust build system (
cargo) - .NET build system (
dotnet 6.0or above) - PowerShell (Recommend
7.4.2or above)
- Rust build system (
- Bump the version number in
NATIVE_LIB_VERSION.txt - Run
build_all_clean.ps1- To build
Tokenizers.DotNet.runtime.<OS>only, runbuild_rust.ps1 - To build
Tokenizers.DotNetonly, runbuild_dotnet.ps1
- To build
Each build artifacts will be in nuget directory.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net5.0 was computed. net5.0-windows was computed. net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
| .NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
| .NET Standard | netstandard2.0 is compatible. netstandard2.1 was computed. |
| .NET Framework | net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
| MonoAndroid | monoandroid was computed. |
| MonoMac | monomac was computed. |
| MonoTouch | monotouch was computed. |
| Tizen | tizen40 was computed. tizen60 was computed. |
| Xamarin.iOS | xamarinios was computed. |
| Xamarin.Mac | xamarinmac was computed. |
| Xamarin.TVOS | xamarintvos was computed. |
| Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETStandard 2.0
- System.Runtime.CompilerServices.Unsafe (>= 6.1.2)
-
net6.0
- System.Runtime.CompilerServices.Unsafe (>= 6.1.2)
NuGet packages (3)
Showing the top 3 NuGet packages that depend on Tokenizers.DotNet:
| Package | Downloads |
|---|---|
|
EDMTranslator
Text translator library based on LLM models, especially EncoderDecoderModel in HuggingFace |
|
|
RAGamuffin
Package Description |
|
|
Berry.Embeddings.MiniLmL6v2
MiniLM-L6-v2 embedding model implementation with ONNX runtime and tokenizer support. |
GitHub repositories (1)
Showing the top 1 popular GitHub repositories that depend on Tokenizers.DotNet:
| Repository | Stars |
|---|---|
|
axzxs2001/Asp.NetCoreExperiment
原来所有项目都移动到**OleVersion**目录下进行保留。新的案例装以.net 5.0为主,一部分对以前案例进行升级,一部分将以前的工作经验总结出来,以供大家参考!
|
| Version | Downloads | Last Updated | |
|---|---|---|---|
| 1.3.0 | 3,117 | 8/31/2025 | |
| 1.2.1 | 1,120 | 7/21/2025 | |
| 1.2.0 | 7,906 | 4/14/2025 | |
| 1.1.3 | 1,895 | 4/6/2025 | |
| 1.1.2 | 291 | 4/1/2025 | |
| 1.1.0 | 1,161 | 12/28/2024 | |
| 1.0.5 | 1,085 | 8/26/2024 | |
| 1.0.4 | 198 | 8/16/2024 | |
| 1.0.1 | 330 | 6/18/2024 | |
| 0.9.2 | 248 | 6/17/2024 | |
| 0.9.1 | 219 | 6/16/2024 | |
| 0.9.0 | 459 | 6/12/2024 | |
| 0.1.0 | 194 | 6/11/2024 | |
| 0.0.1 | 233 | 6/11/2024 |