Sep 0.1.0-preview.6

This is a prerelease version of Sep.
There is a newer version of this package available.
See the version list below for details.
dotnet add package Sep --version 0.1.0-preview.6
NuGet\Install-Package Sep -Version 0.1.0-preview.6
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Sep" Version="0.1.0-preview.6" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Sep --version 0.1.0-preview.6
#r "nuget: Sep, 0.1.0-preview.6"
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install Sep as a Cake Addin
#addin nuget:?package=Sep&version=0.1.0-preview.6&prerelease

// Install Sep as a Cake Tool
#tool nuget:?package=Sep&version=0.1.0-preview.6&prerelease

Sep - Possibly the World's Fastest .NET CSV Parser

Build Status NuGet

Modern, minimal, fast, zero allocation, reading and writing of separated values (csv, tsv etc.). Cross-platform, trimmable and AOT/NativeAOT compatible. Featuring an opinionated API design and pragmatic implementation targetted at machine learning use cases.

⭐ Modern - utilizes modern features such as Span<T>, Generic Math, ref struct and other .NET 7+/C# 11 features.

🔎 Minimal - a succinct yet expressive API with few moving parts, configurations and no hidden changes to input or output. What you read/write is what you get. This means there is no "automatic" escaping/unescaping of quotes, for example.

🗑️ Zero allocation - intelligent and efficient memory management allowing for zero allocations after warmup incl. supporting use cases of reading or writing arrays of values (e.g. features) easily without repeated allocations.

🚀 Fast - blazing fast with both architecture specific and cross-platform SIMD vectorized parsing and using csFastFloat for fast parsing of floating points. Reads or writes one row at a time efficiently with benchmarks to prove it.

🌐 Cross-platform - works on any platform, any architecture supported by .NET. 100% managed and written in beautiful modern C#.

✂️ Trimmable and 🤖 AOT/NativeAOT compatible - no problematic reflection or dynamic code generation. Hence, fully trimmable and Ahead-of-Time compatible. With a simple console tester program executable possible in just a few MBs. 💾

🗣️ Opinionated and pragmatic 🤔 - conforms to the essentials of RFC-4180, but takes an opinionated and pragmatic approach towards this especially with regards to quoting and line ends. See section RFC-4180.

var text = """
           A;B;C;D;E;F
           Sep;🚀;1;1.2;0.1;0.5
           CSV;✅;2;2.2;0.2;1.5
           """;

using var reader = Sep.Reader().FromText(text);  // Infers separator 'Sep' from header
using var writer = reader.Sep.Writer().ToText(); // Writer defined from reader 'Sep'
                                                 // Use .FromFile(...)/ToFile(...) for files
var idx = reader.Header.IndexOf("B");
var nms = new[] { "E", "F" };

foreach (var readRow in reader)           // Read one row at a time
{
    var a = readRow["A"].Span;            // Column as ReadOnlySpan<char>
    var b = readRow[idx].ToString();      // Column to string (allocates new string per call)
    var c = readRow["C"].Parse<int>();    // Parse any T : ISpanParsable<T>
    var d = readRow["D"].Parse<float>();  // Parse float/double fast via csFastFloat
    var s = readRow[nms].Parse<double>(); // Parse multiple columns as Span<T>
                                          // - Sep handles array allocation and reuse
    foreach (ref var v in s) { v *= 10; }

    using var writeRow = writer.NewRow(); // Start new row. Row written on Dispose.
    writeRow["A"].Set(a);                 // Set by ReadOnlySpan<char>
    writeRow["B"].Set(b);                 // Set by string
    writeRow["C"].Set($"{c * 2}");        // Set via InterpolatedStringHandler, no allocs
    writeRow["D"].Format(d / 2);          // Format any T : ISpanFormattable
    writeRow[nms].Format(s);              // Format multiple columns directly
}

var expected = """
               A;B;C;D;E;F
               Sep;🚀;2;0.6;1;5
               CSV;✅;4;1.1;2;15
               """;
Assert.AreEqual(expected, writer.ToString());

// Above example code is for demonstration purposes only.
// Short names and repeated constants are only for demonstration.

Benchmarks

Reader Comparison Benchmarks

BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19044.2728/21H2/November2021Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=7.0.202
  [Host]     : .NET 7.0.4 (7.0.423.11508), X64 RyuJIT AVX2
  Job-LOZXLZ : .NET 7.0.4 (7.0.423.11508), X64 RyuJIT AVX2

Runtime=.NET 7.0  Toolchain=net70  InvocationCount=1  
MaxIterationCount=5  MinIterationCount=1  UnrollFactor=1  
WarmupCount=3  
Type Method Reader Lines Mean Error StdDev Ratio RatioSD Gen0 Gen1 MB MB/s ns/line Gen2 Allocated Alloc Ratio
PackageAssetsBenchParseAsset Sep__________ UTF16 StringReader 200000 140.4 ms 22.62 ms 5.87 ms 1.00 0.00 4000.0000 3000.0000 116 831.4 702.1 1000.0000 54377.29 KB 1.00
PackageAssetsBenchParseAsset Sylvan_______ UTF16 StringReader 200000 174.2 ms 23.79 ms 6.18 ms 1.24 0.07 4000.0000 3000.0000 116 670.4 870.8 1000.0000 54725.33 KB 1.01
PackageAssetsBenchParseAsset ReadSplitLine UTF16 StringReader 200000 469.3 ms 124.65 ms 32.37 ms 3.34 0.23 28000.0000 27000.0000 116 248.8 2346.4 4000.0000 408584.26 KB 7.51
PackageAssetsBenchParseAsset CsvHelper____ UTF16 StringReader 200000 428.2 ms 2.67 ms 0.15 ms 3.00 0.13 4000.0000 3000.0000 116 272.7 2140.9 1000.0000 54543.54 KB 1.00
PackageAssetsBenchParseCsvOnly Sep__________ UTF16 StringReader 1000000 104.5 ms 1.31 ms 0.34 ms 1.00 0.00 - - 583 5587.8 104.5 - 5.93 KB 1.00
PackageAssetsBenchParseCsvOnly Sylvan_______ UTF16 StringReader 1000000 167.5 ms 2.96 ms 0.77 ms 1.60 0.01 - - 583 3486.1 167.5 - 135.29 KB 22.82
PackageAssetsBenchParseCsvOnly ReadSplitLine UTF16 StringReader 1000000 271.2 ms 16.60 ms 4.31 ms 2.60 0.04 108000.0000 - 583 2152.6 271.2 - 1772445.8 KB 298,910.49
PackageAssetsBenchParseCsvOnly CsvHelper____ UTF16 StringReader 1000000 1,392.3 ms 59.08 ms 15.34 ms 13.33 0.13 57000.0000 - 583 419.3 1392.3 - 935255.09 KB 157,724.18

RFC-4180

While the RFC-4180 requires \r\n (CR,LF) as line ending, the well-known line endings (\r\n, \n and \r) are supported similar to .NET. Environment.NewLine is used when writing. Quoting is supported by simply matching pairs of quotes, no matter what. With no automatic escaping. Hence, you are responsible and in control of this at this time.

Note that some libraries will claim conformance but the RFC is, perhaps naturally, quite strict e.g. only comma is supported as separator/delimiter. Sep defaults to using ; as separator if writing, while auto-detecting supported separators when reading. This is decidedly non-conforming.

The RFC defines the following condensed ABNF grammar:

file = [header CRLF] record *(CRLF record) [CRLF]
header = name *(COMMA name)
record = field *(COMMA field)
name = field
field = (escaped / non-escaped)
escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
non-escaped = *TEXTDATA
COMMA = %x2C
CR = %x0D ;as per section 6.1 of RFC 2234 [2]
DQUOTE =  %x22 ;as per section 6.1 of RFC 2234 [2]
LF = %x0A ;as per section 6.1 of RFC 2234 [2]
CRLF = CR LF ;as per section 6.1 of RFC 2234 [2]
TEXTDATA =  %x20-21 / %x23-2B / %x2D-7E

Note how TEXTDATA is restricted too, yet many will allow any character incl. emojis or similar (which Sep supports), but is not in conformance with the RFC.

Quotes inside an escaped field e.g. "fie""ld" are only allowed to be double quotes. Sep currently allows any pairs of quotes and quoting doesn't need to be at start of or end of field (col or column in Sep terminology).

All in all Sep takes a pretty pragmatic approach here as the primary use case is not exchanging data on the internet, but for use in machine learning pipelines or similar.

Product Compatible and additional computed target framework versions.
.NET net7.0 is compatible.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
Compatible target framework(s)
Additional computed target framework(s)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories (2)

Showing the top 2 popular GitHub repositories that depend on Sep:

Repository Stars
DataDog/dd-trace-dotnet
.NET Library for Datadog APM
JasonBock/Rocks
A mocking library based on the Compiler APIs (Roslyn + Mocks)
Version Downloads Last updated
0.4.0 2,829 1/1/2024
0.4.0-preview.1 89 12/23/2023
0.3.0 452,100 11/18/2023
0.2.7 2,325 10/12/2023
0.2.6 443 9/27/2023
0.2.5 236 9/14/2023
0.2.4 219 9/8/2023
0.2.3 246 9/5/2023
0.2.2 475,287 8/10/2023
0.2.1 135 8/10/2023
0.2.0 912 8/7/2023
0.2.0-preview.3 96 7/29/2023
0.1.0 570 5/30/2023
0.1.0-rc.1 84 5/26/2023
0.1.0-preview.8 71 5/26/2023
0.1.0-preview.7 90 5/8/2023
0.1.0-preview.6 87 4/24/2023
0.1.0-preview.5 93 3/19/2023
0.1.0-preview.4 100 12/31/2022
0.1.0-preview.3 100 12/4/2022
0.1.0-preview.2 126 3/21/2022
0.1.0-preview.1 130 1/28/2022