Pandora.Apache.Avro.IDL.To.Apache.Parquet 0.11.18

There is a newer version of this package available.
See the version list below for details.
dotnet add package Pandora.Apache.Avro.IDL.To.Apache.Parquet --version 0.11.18                
NuGet\Install-Package Pandora.Apache.Avro.IDL.To.Apache.Parquet -Version 0.11.18                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Pandora.Apache.Avro.IDL.To.Apache.Parquet" Version="0.11.18" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Pandora.Apache.Avro.IDL.To.Apache.Parquet --version 0.11.18                
#r "nuget: Pandora.Apache.Avro.IDL.To.Apache.Parquet, 0.11.18"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install Pandora.Apache.Avro.IDL.To.Apache.Parquet as a Cake Addin
#addin nuget:?package=Pandora.Apache.Avro.IDL.To.Apache.Parquet&version=0.11.18

// Install Pandora.Apache.Avro.IDL.To.Apache.Parquet as a Cake Tool
#tool nuget:?package=Pandora.Apache.Avro.IDL.To.Apache.Parquet&version=0.11.18                

Pandora.Apache.Avro.IDL.To.Apache.Parquet

Background

Currently, when working with Apache Kafka® and Azure Databricks® (Apache Spark®), there is a built-in mechanism to transform Apache Avro® data to Apache Parquet® files. The issue with this approach, if we think in medallion lakehouse architecture, is that AVRO with nested data, will be persisted in a single PARQUET file in the bronze layer (full, raw and unprocessed history of each dataset) relying on ArrayType, MapType and StructType to represent the nested data. This will make it a bit more tedious to post-process data respectively in the following layers: silver (validated and deduplicated data) and gold (data as knowledge).

Medallion lakehouse architecture
Figure 1: Delta lake medallion architecture and data mesh

To avoid this issue, we present an open-source library, that will help transform AVRO, with nested data, to multiple PARQUET files where each of the nested data elements will be represented as an extension table (separate file). This will allow to merge both the bronze and silver layers (full, raw and history of each dataset combined with defined structure, enforced schemas as well validated and deduplicated data), to make it easier for data engineers/scientists and business analysts to combine data with already known logic (SQL joins).

Azure Databricks notebook
Figure 2: Azure Databricks python notebook and SQL cell

As two of the medallion layers are being combined to a single, it might lead to the possible saving of a ⅓ in disk usage. Furthermore, since we aren't relying on a naive approach, when flattening and storing data, it could further lead to greater savings and a more sustainable and environmentally friendly approach.

Green Software Foundation
Figure 3: Green Software Foundation with the Linux Foundation to put sustainability at the core of software engineering

Project dependencies

Pandora.Apache.Avro.IDL.To.Apache.Parquet

Dependency Author License
FSharp.Core Microsoft MIT License
Apache.Avro The Apache Software Foundation Apache License 2.0
Newtonsoft.Json James Newton-King MIT License
Parquet.Net Ivan G MIT License

Pandora.Apache.Avro.IDL.To.Apache.Parquet.Unit.Tests

Dependency Author License
Microsoft.NET.Test.Sdk Microsoft MIT License
coverlet.collector .NET foundation MIT License
xunit .NET foundation Apache License 2.0
xunit.runner.visualstudio .NET foundation Apache License 2.0
Product Compatible and additional computed target framework versions.
.NET net6.0 is compatible.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
0.11.32 269 5/10/2023
0.11.31 224 4/17/2023
0.11.30 273 3/21/2023
0.11.29 272 3/14/2023
0.11.28 277 3/6/2023
0.11.27 280 3/6/2023
0.11.26 293 3/6/2023
0.11.25 277 3/4/2023
0.11.24 291 3/4/2023
0.11.23 267 3/4/2023
0.11.22 267 2/24/2023
0.11.21 297 2/16/2023
0.11.20 286 2/16/2023
0.11.19 285 2/15/2023
0.11.18 293 2/15/2023
0.11.17 292 2/15/2023
0.11.16 276 2/15/2023
0.11.15 295 2/14/2023
0.11.14 290 2/14/2023
0.11.13 292 2/14/2023
0.11.12 298 2/14/2023
0.11.11 291 2/14/2023
0.11.10 280 2/14/2023
0.11.9 280 2/14/2023
0.11.8 283 2/14/2023
0.11.7 300 2/14/2023
0.11.6 307 2/13/2023
0.11.5 309 2/13/2023
0.11.4 320 2/8/2023
0.11.3 309 2/8/2023
0.11.2 315 2/6/2023
0.11.1 319 2/6/2023
0.11.0 335 2/3/2023