Pandora.Apache.Avro.IDL.To.Apache.Parquet
0.11.17
See the version list below for details.
dotnet add package Pandora.Apache.Avro.IDL.To.Apache.Parquet --version 0.11.17
NuGet\Install-Package Pandora.Apache.Avro.IDL.To.Apache.Parquet -Version 0.11.17
<PackageReference Include="Pandora.Apache.Avro.IDL.To.Apache.Parquet" Version="0.11.17" />
paket add Pandora.Apache.Avro.IDL.To.Apache.Parquet --version 0.11.17
#r "nuget: Pandora.Apache.Avro.IDL.To.Apache.Parquet, 0.11.17"
// Install Pandora.Apache.Avro.IDL.To.Apache.Parquet as a Cake Addin #addin nuget:?package=Pandora.Apache.Avro.IDL.To.Apache.Parquet&version=0.11.17 // Install Pandora.Apache.Avro.IDL.To.Apache.Parquet as a Cake Tool #tool nuget:?package=Pandora.Apache.Avro.IDL.To.Apache.Parquet&version=0.11.17
Pandora.Apache.Avro.IDL.To.Apache.Parquet
Background
Currently, when working with Apache Kafka® and Azure
Databricks® (Apache Spark®), there is a
built-in mechanism to transform Apache Avro® data to Apache
Parquet® files. The issue with this approach, if we think in
medallion lakehouse architecture, is that
AVRO
with nested data, will be persisted in a single PARQUET
file in the
bronze layer (full, raw and unprocessed history of each
dataset) relying on ArrayType
,
MapType
and StructType
to represent the nested data. This will make it a bit
more tedious to post-process data respectively in the following layers: silver
(validated and deduplicated data) and
gold (data as knowledge).
Figure 1: Delta lake medallion architecture and data mesh |
To avoid this issue, we present an open-source library, that will help
transform AVRO
, with nested data, to multiple PARQUET
files where each of
the nested data elements will be represented as an extension table (separate
file). This will allow to merge both the bronze and silver layers
(full, raw and history of each dataset combined with defined structure,
enforced schemas as well validated and deduplicated data), to make it easier
for data engineers/scientists and business analysts to combine data with already
known logic (SQL joins
).
Figure 2: Azure Databricks python notebook and SQL cell |
As two of the medallion layers are being combined to a single, it might lead to the possible saving of a ⅓ in disk usage. Furthermore, since we aren't relying on a naive approach, when flattening and storing data, it could further lead to greater savings and a more sustainable and environmentally friendly approach.
Figure 3: Green Software Foundation with the Linux Foundation to put sustainability at the core of software engineering |
Project dependencies
Pandora.Apache.Avro.IDL.To.Apache.Parquet
Dependency | Author | License |
---|---|---|
FSharp.Core | Microsoft | MIT License |
Apache.Avro | The Apache Software Foundation | Apache License 2.0 |
Newtonsoft.Json | James Newton-King | MIT License |
Parquet.Net | Ivan G | MIT License |
Pandora.Apache.Avro.IDL.To.Apache.Parquet.Unit.Tests
Dependency | Author | License |
---|---|---|
Microsoft.NET.Test.Sdk | Microsoft | MIT License |
coverlet.collector | .NET foundation | MIT License |
xunit | .NET foundation | Apache License 2.0 |
xunit.runner.visualstudio | .NET foundation | Apache License 2.0 |
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
-
net6.0
- Apache.Avro (>= 1.11.1)
- FSharp.Core (>= 6.0.5)
- Microsoft.Extensions.Logging.Abstractions (>= 7.0.0)
- Newtonsoft.Json (>= 13.0.2)
- Parquet.Net (>= 4.2.2)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last updated |
---|---|---|
0.11.32 | 269 | 5/10/2023 |
0.11.31 | 224 | 4/17/2023 |
0.11.30 | 273 | 3/21/2023 |
0.11.29 | 272 | 3/14/2023 |
0.11.28 | 277 | 3/6/2023 |
0.11.27 | 280 | 3/6/2023 |
0.11.26 | 293 | 3/6/2023 |
0.11.25 | 277 | 3/4/2023 |
0.11.24 | 291 | 3/4/2023 |
0.11.23 | 267 | 3/4/2023 |
0.11.22 | 267 | 2/24/2023 |
0.11.21 | 297 | 2/16/2023 |
0.11.20 | 286 | 2/16/2023 |
0.11.19 | 285 | 2/15/2023 |
0.11.18 | 293 | 2/15/2023 |
0.11.17 | 292 | 2/15/2023 |
0.11.16 | 276 | 2/15/2023 |
0.11.15 | 295 | 2/14/2023 |
0.11.14 | 290 | 2/14/2023 |
0.11.13 | 292 | 2/14/2023 |
0.11.12 | 298 | 2/14/2023 |
0.11.11 | 291 | 2/14/2023 |
0.11.10 | 280 | 2/14/2023 |
0.11.9 | 280 | 2/14/2023 |
0.11.8 | 283 | 2/14/2023 |
0.11.7 | 300 | 2/14/2023 |
0.11.6 | 307 | 2/13/2023 |
0.11.5 | 309 | 2/13/2023 |
0.11.4 | 320 | 2/8/2023 |
0.11.3 | 309 | 2/8/2023 |
0.11.2 | 315 | 2/6/2023 |
0.11.1 | 319 | 2/6/2023 |
0.11.0 | 335 | 2/3/2023 |