CSVParserGenerator 1.5.0

There is a newer version of this package available.
See the version list below for details.
dotnet add package CSVParserGenerator --version 1.5.0                
NuGet\Install-Package CSVParserGenerator -Version 1.5.0                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="CSVParserGenerator" Version="1.5.0" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add CSVParserGenerator --version 1.5.0                
#r "nuget: CSVParserGenerator, 1.5.0"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install CSVParserGenerator as a Cake Addin
#addin nuget:?package=CSVParserGenerator&version=1.5.0

// Install CSVParserGenerator as a Cake Tool
#tool nuget:?package=CSVParserGenerator&version=1.5.0                

NuGet GitHub license

CSV Parser Generator

A Parser for CSV with support for uncommon line separators (e.g. Unicode) and instantiation of read-only objects and working nullable handling.

Getting started

Assuming you want to populate following Data Type

class Person{
    public Guid Id {get;init;}
    public required string Name {get;init;}
    public DateOnly Birthdate {get;init;}
    public int Score {get;init;}
}

and following data

B2DDC789-A793-4BEA-9A0B-D00FFCDD1FB5,"John Doe",2000-03-01,50
BE4015A0-9B37-4A1E-A92A-405496A1FF96,"Max Mustermann",2001-04-01,64
  1. Include the NuGet package
  2. Add following Method to one of your classes (the class must be partial)
    [Parser.CSVParser(nameof(Person.Id), nameof(Person.Name), nameof(Person.Birthdate), nameof(Person.Score))]
    internal partial IEnumerable<Person> ParseData(ReadOnlySpan<byte> data);
    
  3. Call your ParseData and run your program, that's it.

You can use ReadOnlySpan<byte> or ReadOnlySpan<char> as input.

The result Type must have a parameter less constructor, the properties are set with new MyType() {Property1 = …} this allows of the required keyword and init property's which should led to less problems with non nullable properties.

Container Types (return type of the Method) must satisfy following if they are not an Interface:

  • Have a generic argument that defines the element type
  • Have an Add(T toadd) Method

If they are Interfaces, they must be implemented by one of the following classes

  • System.Collections.Immutable.ImmutableArray
  • System.Collections.Immutable.ImmutableHashSet
  • System.Collections.Immutable.ImmutableList
  • System.Collections.Generic.List
  • System.Collections.Generic.HashSet
  • System.Collections.Generic.HashSet

For ImmutableArray, ImmutableList and ImmutableHashSet a builder is used to populate the collection

The Properties of the result type must either be string or need to implement ISpanParsable<T>. The Attribute has a list of Property names that will be set. Starting with the first column up to the last. If some columns should be ignored, use null for that columns property.

Configuration

There are several ways the parser can be configured

CSVParse Attribute

Settings you can change via the Attribute

HasHeader

If the first line should be ignored, normally because it contains a header, set this Property to true, default false

 [Parser.CSVParser(new string[]{nameof(Person.Id), nameof(Person.Name), nameof(Person.Birthdate), nameof(Person.Score)}, HasHeader = true )]
SeperatorSymbol

The symbol that separates the columns. Default ,

 [Parser.CSVParser(new string[]{nameof(Person.Id), nameof(Person.Name), nameof(Person.Birthdate), nameof(Person.Score)}, SeperatorSymbol = ';' )]
QuoteSymbol

The symbol that sets the quote character. Default "

 [Parser.CSVParser(new string[]{nameof(Person.Id), nameof(Person.Name), nameof(Person.Birthdate), nameof(Person.Score)}, QuoteSymbol = '"' )]
ExtendedLineFeed

Normally the parser will only recognize U+000A New Line and U+000D Carriage return as line separators. If your CSV file contains uncommon line breaks like U+0085 Next Line (Nel), you need to enable this setting. Default is false.

Supported are

  • U+0085 Next Line (Nel)
  • U+000C Form Feed (FF)
  • U+2028 Line Separator (LS) Only when parsing chars
  • U+2029 Paragraph Separator Only when parsing chars

Runtime Options

You can add a second parameter to the method, to pass several additional options. The generic parameter of the Option<T> class must be the same as the input

internal partial IEnumerable<Person> ParseData(ReadOnlySpan<byte> data, Parser.Option<byte> options);
// or…
internal partial IEnumerable<Person> ParseData(ReadOnlySpan<char> data, Parser.Option<char> options);
NumberOfElements

To give the parser a hint how many elements will be parsed, use this setting. It will pass the int in the constructor as only parameter when creating the collection. This normally sets the capacity.

Parser.Options<char> options = new() { 
    NumberOfElements = 1_000_000
};
ParseData(data, options);
StringFactory

This method will be used to create stings from ReadOnlySpan (either byte or char). You can use it to define which encoding is used when reading bytes. Default for bytes is UTF8 for chars the ToString()-Method is called.

You can also use this factory to deduplicate strings. On large datasets with repeating entries this can speed up time significant.

Parser.Options<char> options = new() { 
    StringFactory = System.Text.Encoding.Latin1.GetString
};
ParseData(data, options);
OnError

The parse method should not throw errors. Lines that do not match your definition, will be ignored. You can register a Callback with the OnError Property to be notified every time a row is ignored.

Depending on the Error different classes will be returned. The Type describes the kind of error and hold additional information. E.g. LineErrorParseError has ParsedElement that holds the string that could not be parsed (with the limitation that it only works for char data).

Parser.Options<char> options = new() { 
    Culture = System.Globalization.CultureInfo.CurrentCulture
};
ParseData(data, options);
Culture

When parsing the fields to the configured CultureInfo is passed in the parse method of ISpanParsable. Default is the InvariantCulture.

Parser.Options<char> options = new() { 
    Culture = System.Globalization.CultureInfo.CurrentCulture
};
ParseData(data, options);

Parse Types that do not implement ISpanParsable and control parsing

If you need to deserialize a Type that dose not implement the ISpanParsable interface or the default parse method provided by it dose not work for you, yod can override the default behavior by adding an additional attribute to the pares method.

To change the parsing for one property, pass the name of the Property as first argument to CSVPTransformer.

[Parser.CSVPTransformer(nameof(Person.Birthdate),nameof(ToDate) )]

To change the parsing for all instances of one types, pass the desired type as first argument to CSVPTransformer.

[Parser.CSVPTransformer(typeof(DateOnly),nameof(ToDate) )]

The second parameter is a method that will be called. It returns the desired object and accepts either ReadOnlySpan<char> or ReadOnlySpan<byte>.

A complete sample

[Parser.CSVParser(nameof(Person.Id), nameof(Person.Name), nameof(Person.Birthdate), nameof(Person.Score))]
[Parser.CSVPTransformer(nameof(Person.Birthdate),nameof(ToDate) )]
public static partial IEnumerable<Person> ParseData(ReadOnlySpan<char> data);

private static DateOnly ToDate(ReadOnlySpan<char> data)
{
    return DateOnly.ParseExact(data, "dd-MM-yyyy");
}
There are no supported framework assets in this package.

Learn more about Target Frameworks and .NET Standard.

  • .NETStandard 2.0

    • No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
1.9.0 253 5/2/2024
1.8.0 308 1/6/2024
1.7.0 213 12/29/2023
1.6.0 296 7/6/2023
1.5.0 169 7/4/2023
1.4.0 193 7/4/2023
1.3.0 182 6/23/2023
1.2.0 190 6/22/2023
1.1.0 153 6/22/2023
1.0.0 182 6/22/2023