Izumi.SICK 0.4.9.1

There is a newer version of this package available.
See the version list below for details.
dotnet add package Izumi.SICK --version 0.4.9.1
                    
NuGet\Install-Package Izumi.SICK -Version 0.4.9.1
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Izumi.SICK" Version="0.4.9.1" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Izumi.SICK" Version="0.4.9.1" />
                    
Directory.Packages.props
<PackageReference Include="Izumi.SICK" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Izumi.SICK --version 0.4.9.1
                    
#r "nuget: Izumi.SICK, 0.4.9.1"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#addin nuget:?package=Izumi.SICK&version=0.4.9.1
                    
Install Izumi.SICK as a Cake Addin
#tool nuget:?package=Izumi.SICK&version=0.4.9.1
                    
Install Izumi.SICK as a Cake Tool

Build Latest Release Maven Central Latest version

SICK: Streams of Independent Constant Keys

SICK is an approach to handle JSON-like structures and various libraries implementing it.

SICK allows you to achieve the following:

  1. Store JSON-like data in efficient indexed binary form
  2. Avoid reading and parsing whole JSON files and access only the data you need just in time
  3. Store multiple JSON-like structures in one deduplicating storage
  4. Implement perfect streaming parsers for JSON-like data
  5. Efficiently stream updates for JSON-like data

The tradeoff for these benefits is somehow more complicated and less efficient encoder.

The problem

JSON has a Type-2 grammar and requires a pushdown automaton to parse it. So, it's not possible to implement efficient streaming parser for JSON. Just imagine a huge hierarchy of nested JSON objects: you won't be able to finish parsing the top-level object until you process the whole file.

JSON is frequently used to store and transfer large amounts of data and these transfers tend to grow over time. Just imagine a typical JSON config file for a large enterprise product.

The non-streaming nature of almost all the JSON parsers requires a lot of work to be done every time you need to deserialize a huge chunk of JSON data: you need to read it from disk, parse it in memory into an AST representation, and, usually, map raw JSON tree to object instances. Even if you use token streams and know the type of your object ahead of time you still have to deal with the Type-2 grammar.

This may be very inefficient and causes unnecessary delays, pauses, CPU activity and memory consumption spikes.

The idea

Let's assume that we have a small JSON:

[
    {"some key": "some value"},
    {"some key": "some value"},
    {"some value": "some key"},
]

Let's build a table for every unique value in our JSON :

Type index Value Is Root
string 0 "some key" No
string 1 "some value" No
object 0 [string:0, string:1] No
object 1 [string:1, string:0] No
array 0 [object:0, object:0, object:1] Yes (file.json)

We just built a flattened and deduplicated version of our initial JSON structure.

Streaming

Such representation allows us to do many different things, for example we may stream our table:

string:0 = "some key"
string:1 = "some value"

object:0.size = 2
object:0[string:0] = string:1
object:1[string:1] = string:0

array:0.size = 2
array:0[0] = object:0
array:0[1] = object:1

string:2 = "file.json"

root:0=array:0,string:2

This particular encoding is inefficient but it's streamable and, moreover, we can add removal message into it thus supporting arbitrary updates:

array:0[0] = object:1
array:0[1] = remove

There is an interesting observation: when a stream does not contain removal entries it can be safely reordered. Unfortunately, in some usecases the receiver still may need to accumulate the entries in a buffer until it can sort them out.

Binary format: EBA (Efficient Binary Aggregate)

We may note that the only complex data structures in our "Value" column are lists and (type, index) pairs. Let's call such pairs "references".

A reference can be represented as a pair of integers, so it would have a fixed byte length.

A list of references can be represented as an integer storing list length followed by all the references in their binary form. Let's note that such binary structure is indexed, once we know the index of an element we want to access we can do it immediately.

A list of any fixed-size scalar values can be represented the same way.

A list of variable-size values (e.g. a list of strings) can be represented the following way:

  {strings count}{list of string offsets}{all the strings concatenated}

So, ["a", "bb", "ccc"] would become something like 3 0 2 3 a b bb ccc without spaces.

An important fact is that this encoding is indexed too and it can be reused to store any lists of variable-length data.

EBA Structures

TODO: explain the overall EBA structure format, including tables, etc

Additional capabilities over JSON

SICK encoding follows compositional principles of JSON (a set primitive types plus lists and dictionaries), though it is more powerful: it has "reference" type and allows you to encode custom types.

(1) It's easy to note that our table may store circular references, something JSON can't do natively:

Type index Value Is Root
object 0 [string:0, object:1] No
object 1 [string:1, object:0] No

This may be convenient in some complex cases.

(2) Also we may note, that we may happily store multiple json files in one table and have full deduplication over their content. We just need to introduce a separate attribute (is root) storing either nothing or the name of our "root entry" (JSON file).

In real implementation it's more convenient to just create a separate "root" type, the value of a root type should always be a reference to its name and a reference to the actual JSON value we encoded:

Type index Value
string 0 "some key"
string 1 "some value"
string 2 "some value"
object 0 [string:0, string,1]
object 1 [string:1, string:0]
array 0 [object:0, object:0, object:1]
root 0 [string:2, array:0]

(3) We may encode custom scalar data types (e.g. timestamps) natively just by introducing new type tags.

(4) We may even store polymorphic types by introducing new type tags or even new type references.

Implementation

Currently we provide C# and Scala implementations of SICK indexed binary JSON storage. Currently the code in this repository has no streaming capabilities. That may change in the future. It's not a hard problem to add streaming support, your contributions are welcome.

Language Binary Storage Encoder Binary Storage Decoder Stream Encoder Stream Decoder Encoder AST Decoder AST
Scala Yes Yes No No Circe N/A
C# Yes Yes No No JSON.Net Custom
Supported types

A type marker is represented as a single-byte unsigned integer. The possible values are:

Marker Name Comment Value Length (bytes) C# mapping Scala Mapping
0 TNul Equivalent to null in JSON 4, stored in the marker
1 TBit Boolean 4, stored in the marker
2 TByte Byte, 4, stored in the marker byte (unsigned) Byte (signed)
3 TShort Signed 16-bit integer 4, stored in the marker
4 TInt Signed 32-bit integer 4
5 TLng Signed 64-bit integer 8
6 TBigInt Variable, prefixed
7 TDbl 8
8 TFlt 4
9 TBigDec Variable, prefixed Custom: scale/precision/signum/unscaled quadruple in C#
10 TStr UTF-8 String Variable, prefixed
11 TArr List of array entries Variable, prefixed
12 TObj List of object entries Variable, prefixed
15 TRoot Index of the name string (4 bytes) + reference (4+1=5 bytes) 9
References

TODO

Lists

TODO

Array entries

Array entries are just references.

Object entries

TODO

Object entry skip list and KHash

TODO

Value tables

TODO

Limitations

Current implementation has the following limitations:

  1. Maximum object size: 65534 keys
  2. The order of object keys is not preserved
  3. Maximum amount of array elements: 2^32
  4. Maximum amount of unique values of the same type: 2^32

These limitations may be lifted by using more bytes to store offset pointers and counts on binary level. Though it's hard to imagine a real application which would need that.

Product Compatible and additional computed target framework versions.
.NET net5.0 was computed.  net5.0-windows was computed.  net6.0 was computed.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
.NET Core netcoreapp3.0 was computed.  netcoreapp3.1 was computed. 
.NET Standard netstandard2.1 is compatible. 
MonoAndroid monoandroid was computed. 
MonoMac monomac was computed. 
MonoTouch monotouch was computed. 
Tizen tizen60 was computed. 
Xamarin.iOS xamarinios was computed. 
Xamarin.Mac xamarinmac was computed. 
Xamarin.TVOS xamarintvos was computed. 
Xamarin.WatchOS xamarinwatchos was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
0.5.1 130 5/26/2025
0.5.0 153 5/21/2025
0.5.0-alpha.199.1.15219574745 40 5/23/2025
0.5.0-alpha.197.1.15195564963 101 5/22/2025
0.5.0-alpha.196.1.15173386563 104 5/21/2025
0.4.11-alpha.194.1.15167765464 106 5/21/2025
0.4.11-alpha.192.1.15138747875 108 5/20/2025
0.4.11-alpha.191.1.15138506574 106 5/20/2025
0.4.11-alpha.190.1.15124036505 109 5/19/2025
0.4.11-alpha.169.1.14993544479 196 5/13/2025
0.4.11-alpha.168.1.14977825981 200 5/12/2025
0.4.10 214 5/12/2025
0.4.10-alpha.167.1.14977742411 194 5/12/2025
0.4.10-alpha.166.1.14976311953 192 5/12/2025
0.4.9.1 217 5/12/2025
0.4.9.1-alpha.164.1... 195 5/12/2025
0.4.9 136 5/10/2025
0.4.9-alpha.162.1.14974759344 177 5/12/2025
0.4.9-alpha.161.1.14974321384 179 5/12/2025
0.4.9-alpha.160.1.14949842259 98 5/10/2025
0.4.8 56 5/9/2025
0.4.8-alpha.158.1.14939007107 49 5/9/2025
0.4.8-alpha.157.1.14938895987 35 5/9/2025
0.4.8-alpha.156.1.14938665842 34 5/9/2025
0.4.7 65 5/9/2025
0.4.7-alpha.154.1.14938659573 35 5/9/2025
0.4.7-alpha.152.1.14936503746 50 5/9/2025
0.4.6 131 4/30/2025
0.4.6-alpha.151.1.14936494084 47 5/9/2025
0.4.6-alpha.150.1.14936359483 47 5/9/2025
0.4.6-alpha.149.1.14765141688 110 4/30/2025
0.4.5 145 4/1/2025
0.4.5-alpha.147.1.14765133398 106 4/30/2025
0.4.5-alpha.146.1.14755251221 108 4/30/2025
0.4.5-alpha.145.1.14755160987 110 4/30/2025
0.4.5-alpha.144.1.14626036256 115 4/23/2025
0.4.5-alpha.143.1.14625255689 119 4/23/2025
0.4.5-alpha.142.1.14604155937 124 4/22/2025
0.4.5-alpha.141.1.14604047533 122 4/22/2025
0.4.5-alpha.140.1.14577634364 120 4/21/2025
0.4.5-alpha.138.1.14540653076 88 4/18/2025
0.4.5-alpha.137.1.14201811805 121 4/1/2025
0.4.4 141 4/1/2025
0.4.4-alpha.134.1.14201804018 117 4/1/2025
0.4.4-alpha.133.1.14090885385 106 3/26/2025
0.4.3 124 3/26/2025
0.4.3-alpha.132.1.14090714214 102 3/26/2025
0.4.2 123 3/26/2025
0.4.2-alpha.130.1.14090642391 97 3/26/2025
0.4.1 460 3/25/2025
0.4.1-alpha.128.1.14090604690 99 3/26/2025
0.4.1-alpha.127.1.14090355078 100 3/26/2025
0.4.1-alpha.126.1.14067280918 440 3/25/2025
0.4.0 465 3/25/2025
0.4.0-alpha.123.1.14066937581 436 3/25/2025
0.3.0 459 3/25/2025
0.3.0-alpha.122.1.14066020519 436 3/25/2025
0.2.0-alpha.120.1.14065856271 435 3/25/2025
0.1.9 105 10/24/2024
0.1.9-alpha.101.1.11543131792 61 10/27/2024
0.1.9-alpha.100.1.11543099574 56 10/27/2024
0.1.9-alpha.99.1.11543029531 58 10/27/2024
0.1.9-alpha.97.1.11542766659 62 10/27/2024
0.1.9-alpha.96.1.11542759889 53 10/27/2024
0.1.9-alpha.95.1.11542743035 55 10/27/2024
0.1.9-alpha.92.1.11535380531 56 10/26/2024
0.1.9-alpha.91.1.11534754760 53 10/26/2024
0.1.9-alpha.90.1.11534728272 56 10/26/2024
0.1.9-alpha.84.1.11524147279 51 10/25/2024
0.1.9-alpha.80.1.11501877492 53 10/24/2024
0.1.8 129 9/11/2024
0.1.8-alpha.78.1.11501836223 61 10/24/2024
0.1.8-alpha.77.1.10816114029 66 9/11/2024
0.1.6 260 11/13/2023
0.1.6-alpha.74.1.10815646116 64 9/11/2024
0.1.6-alpha.73.1.10779432710 61 9/9/2024
0.1.6-alpha.72.1.10779398369 62 9/9/2024
0.1.6-alpha.71.1.10779340615 58 9/9/2024
0.1.6-alpha.70.1.10779255916 67 9/9/2024
0.1.6-alpha.69.1.10778956854 63 9/9/2024
0.1.6-alpha.68.1.10778347980 60 9/9/2024
0.1.6-alpha.67.1.10773767841 57 9/9/2024
0.1.6-alpha.66.1.10773391549 60 9/9/2024
0.1.6-alpha.65.1.10309929078 84 8/8/2024
0.1.6-alpha.64.1.9569809415 66 6/18/2024
0.1.6-alpha.63.1.9569252981 64 6/18/2024
0.1.6-alpha.62.1.9569209999 67 6/18/2024
0.1.6-alpha.61.1.7489993321 76 1/11/2024
0.1.5 162 11/2/2023
0.1.5-alpha.60.1.7489990759 72 1/11/2024
0.1.5-alpha.58.1.6855319998 105 11/13/2023
0.1.5-alpha.56.1.6852209177 73 11/13/2023
0.1.5-alpha.55.1.6777279602 76 11/6/2023
0.1.5-alpha.54.1.6777234035 80 11/6/2023
0.1.5-alpha.53.1.6777218278 76 11/6/2023
0.1.5-alpha.52.1.6734224890 79 11/2/2023
0.1.3 165 9/13/2023
0.1.3-alpha.50.1.6174784251 91 9/13/2023
0.1.2 164 9/13/2023
0.1.1 160 9/13/2023
0.1.0-alpha.46.1.6174266386 93 9/13/2023
0.1.0-alpha.45.1.3857949337 124 1/6/2023
0.1.0-alpha.44.2.3849219603 124 1/6/2023
0.1.0-alpha.44.1.3849219603 125 1/5/2023
0.1.0-alpha.42.1.3849045352 120 1/5/2023
0.1.0-alpha.41.3.3848986268 121 1/5/2023
0.1.0-alpha.41.1.3848986268 123 1/5/2023
0.1.0-alpha.39 125 1/5/2023
0.1.0-alpha.38 128 1/5/2023
0.1.0-alpha.37 122 1/5/2023
0.1.0-alpha.36 124 1/5/2023
0.1.0-alpha.35 126 1/4/2023
0.1.0-alpha.34 120 1/4/2023
0.1.0-alpha.33 125 1/4/2023
0.1.0-alpha.32 122 1/4/2023
0.1.0-alpha.31 121 1/4/2023
0.1.0-alpha.30 122 1/4/2023
0.1.0-alpha.29 123 1/4/2023
0.1.0-alpha.28 123 1/4/2023
0.1.0-alpha.27 121 1/4/2023
0.1.0-alpha.26 124 12/29/2022
0.1.0-alpha.25 121 12/29/2022
0.1.0-alpha.24 120 12/29/2022
0.1.0-alpha.23 118 12/29/2022
0.1.0-alpha.22 115 12/23/2022
0.0.2-alpha.21 118 12/23/2022
0.0.2-alpha.20 124 12/21/2022
0.0.2-alpha.16 125 12/21/2022
0.0.2-alpha.15 120 12/21/2022
0.0.2-alpha.14 117 12/21/2022
0.0.2-alpha.4 116 12/21/2022
0.0.2-alpha.3 117 12/21/2022
0.0.2-alpha.2 120 12/21/2022
0.0.2-alpha.1 117 12/21/2022
0.0.2-alpha 156 12/21/2022