PDFParser-CSharp
1.2.2
dotnet add package PDFParser-CSharp --version 1.2.2
NuGet\Install-Package PDFParser-CSharp -Version 1.2.2
<PackageReference Include="PDFParser-CSharp" Version="1.2.2" />
paket add PDFParser-CSharp --version 1.2.2
#r "nuget: PDFParser-CSharp, 1.2.2"
// Install PDFParser-CSharp as a Cake Addin #addin nuget:?package=PDFParser-CSharp&version=1.2.2 // Install PDFParser-CSharp as a Cake Tool #tool nuget:?package=PDFParser-CSharp&version=1.2.2
PDFIndexer
Useful and easy way to get text from pdf (including metadata)
Which a single line you can add a batch of PDF's And with other single line you can search exactly where she is on the text (or more!)
Architecture
To Install using Nuget PM
Install-Package PDFParser-CSharp -Version 1.2.1
How to use
General use:
//TO ADD A BATCH OF PDF'S
ProcessPDF.AddPDFs(new List<string>() { path });
//TO SEARCH OVER THEM
var result = ProcessPDF.GetVisualResults("{your search word}");
To use:
string path = "path with my pdf"
TextExtractor te = new TextExtractor();
var list = te.ExtractLinesMetadata(path);
Methods
ExtractFullText → Extract full text as a single string
ExtractWordsMetadata → Extract every single word with metadata (text, Point X, Point Y, Width and Height)
ExtractLinesMetada → Extract every single word with metadata (text, Point X, Point Y, Width and Height)
GeIndexMetadata → To create a hOCR or other xml pattern page, we have this class with all text and points of every line and word.
In all cases you can use string or stream to pass the pdf document.
Main Methods
- AddPDFs → receive a list of strings to process and save
- GetVisualResults → Recieve a string and search on the metadata database
** The result should be a list of SampleObject with the word, the position and others metadatas for each word found
{ HighlightObject = { IndexMetadata Metadata List<BoundingBox> HighlightedWords string Keyword int PageNumber }, Metadata = { string Text List<PdfMetadata> ListOfLines List<PdfMetadata> ListOfWords string PDFURI }, ImageUri = "https://{uri_image_path}" };
Expected Results
ExtractFullText
"some text of entire page (or pages)"
ExtractWordsMetadata
[
{
Text = "some"
X = 150.233
Y = 88.45
Width = 12.2
Height = 11.82
PageInfo = {
PageNumber = 1,
BlobkId = 0
}
}
{
Text = "text"
X = 170.233
Y = 88.45
Width = 12.2
Height = 11.82
PageInfo = {
PageNumber = 1,
BlobkId = 1
}
}
]
ExtractLinesMetada
{
Text = "some text of entire line"
X = 150.233
Y = 88.45
Width = 12.2
Height = 11.82
PageInfo = {
PageNumber = 1,
BlobkId = 0
}
}
GetIndexMetadata
{
Text = "some text of entire line"
ListOfLines =
[
{
Text = "some text of entire line"
X = 150.233
Y = 88.45
Width = 12.2
Height = 11.82
PageInfo = {
PageNumber = 2,
BlobkId = 12
}
},
...
]
ListOfWords =
[
{
Text = "some"
X = 150.233
Y = 88.45
Width = 12.2
Height = 11.82
PageInfo = {
PageNumber = 1,
BlobkId = 0
}
}
{
Text = "text"
X = 170.233
Y = 88.45
Width = 12.2
Height = 11.82
PageInfo = {
PageNumber = 1,
BlobkId = 1
}
}
]
}
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
.NET Core | netcoreapp2.0 is compatible. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
-
.NETCoreApp 2.0
- Autofac (>= 4.8.1)
- Ghostscript.NET (>= 1.2.1)
- itext7 (>= 7.1.2)
- Lucene.Net (>= 4.8.0-beta00005)
- SixLabors.ImageSharp (>= 1.0.0-beta0004)
- SixLabors.ImageSharp.Drawing (>= 1.0.0-beta0004)
- System.Drawing.Common (>= 4.5.0)
- WindowsAzure.Storage (>= 9.2.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.