Tabula 0.1.4
dotnet add package Tabula --version 0.1.4
NuGet\Install-Package Tabula -Version 0.1.4
<PackageReference Include="Tabula" Version="0.1.4" />
paket add Tabula --version 0.1.4
#r "nuget: Tabula, 0.1.4"
// Install Tabula as a Cake Addin #addin nuget:?package=Tabula&version=0.1.4 // Install Tabula as a Cake Tool #tool nuget:?package=Tabula&version=0.1.4
tabula-sharp
tabula-sharp
is a library for extracting tables from PDF files — it is a port of tabula-java
- Supports netstandard2.0, net462, net471, net6.0, net8.0
- No java bindings
NuGet packages available on the releases page and on www.nuget.org:
Differences with tabula-java
- Uses PdfPig, and not PdfBox.
- Coordinate system starts from the bottom left point (going up) of the page, and not from the top left point (going down).
- The
NurminenDetectionAlgorithm
is replaced bySimpleNurminenDetectionAlgorithm
, because it requieres an image management library. - Table results might be different because of the way PdfPig builds Letters bounding box.
Usage
Stream mode - BasicExtractionAlgorithm
using (PdfDocument document = PdfDocument.Open("doc.pdf", new ParsingOptions() { ClipPaths = true }))
{
ObjectExtractor oe = new ObjectExtractor(document);
PageArea page = oe.Extract(1);
// detect canditate table zones
SimpleNurminenDetectionAlgorithm detector = new SimpleNurminenDetectionAlgorithm();
var regions = detector.Detect(page);
IExtractionAlgorithm ea = new BasicExtractionAlgorithm();
List<Table> tables = ea.Extract(page.GetArea(regions[0].BoundingBox)); // take first candidate area
var table = tables[0];
var rows = table.Rows;
}
Lattice mode - SpreadsheetExtractionAlgorithm
using (PdfDocument document = PdfDocument.Open("doc.pdf", new ParsingOptions() { ClipPaths = true }))
{
ObjectExtractor oe = new ObjectExtractor(document);
PageArea page = oe.Extract(1);
IExtractionAlgorithm ea = new SpreadsheetExtractionAlgorithm();
List<Table> tables = ea.Extract(page);
var table = tables[0];
var rows = table.Rows;
}
Results
Stream mode - BasicExtractionAlgorithm
Lattice mode - SpreadsheetExtractionAlgorithm
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. |
.NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
.NET Standard | netstandard2.0 is compatible. netstandard2.1 was computed. |
.NET Framework | net461 was computed. net462 is compatible. net463 was computed. net47 was computed. net471 is compatible. net472 was computed. net48 was computed. net481 was computed. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen40 was computed. tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
NuGet packages (3)
Showing the top 3 NuGet packages that depend on Tabula:
Package | Downloads |
---|---|
Tabula.Json
Extract tables from PDF files (port of tabula-java using PdfPig). Json writer. |
|
Tabula.Csv
Extract tables from PDF files (port of tabula-java using PdfPig). Csv and Tsv writers. |
|
DocumentAtom.Pdf
DocumentAtom provides a light, fast library for breaking input PDF documents into constituent parts (atoms), useful for AI, machine learning, processing, analytics, and general analysis. |
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last updated |
---|---|---|
0.1.4 | 23,914 | 10/6/2024 |
0.1.4-alpha001 | 8,908 | 10/19/2023 |
0.1.3 | 213,323 | 6/1/2022 |
0.1.2 | 18,931 | 1/29/2022 |
0.1.1 | 10,550 | 7/18/2021 |
0.1.1-alpha001 | 523 | 3/6/2021 |
0.1.0 | 17,697 | 1/17/2021 |
0.1.0-alpha002 | 515 | 10/26/2020 |