PDFParser-CSharp 1.2.2

.NET Core 2.0

dotnet add package PDFParser-CSharp --version 1.2.2

NuGet\Install-Package PDFParser-CSharp -Version 1.2.2

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="PDFParser-CSharp" Version="1.2.2" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

paket add PDFParser-CSharp --version 1.2.2

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: PDFParser-CSharp, 1.2.2"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

// Install PDFParser-CSharp as a Cake Addin
#addin nuget:?package=PDFParser-CSharp&version=1.2.2

// Install PDFParser-CSharp as a Cake Tool
#tool nuget:?package=PDFParser-CSharp&version=1.2.2

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

PDFIndexer

Useful and easy way to get text from pdf (including metadata)

Which a single line you can add a batch of PDF's And with other single line you can search exactly where she is on the text (or more!)

Architecture

https://www.nuget.org/packages/PDFParser-CSharp/

To Install using Nuget PM

Install-Package PDFParser-CSharp -Version 1.2.1

How to use

General use:

    //TO ADD A BATCH OF PDF'S
    ProcessPDF.AddPDFs(new List<string>() { path });
    
    //TO SEARCH OVER THEM
    var result = ProcessPDF.GetVisualResults("{your search word}");

To use:

            string path = "path with my pdf"
            TextExtractor te = new TextExtractor();
            var list = te.ExtractLinesMetadata(path);

Methods

ExtractFullText → Extract full text as a single string

ExtractWordsMetadata → Extract every single word with metadata (text, Point X, Point Y, Width and Height)

ExtractLinesMetada → Extract every single word with metadata (text, Point X, Point Y, Width and Height)

GeIndexMetadata → To create a hOCR or other xml pattern page, we have this class with all text and points of every line and word.

In all cases you can use string or stream to pass the pdf document.

Main Methods

AddPDFs → receive a list of strings to process and save

GetVisualResults → Recieve a string and search on the metadata database ** The result should be a list of SampleObject with the word, the position and others metadatas for each word found

{
    HighlightObject = {
        IndexMetadata Metadata
        List<BoundingBox> HighlightedWords
        string Keyword
        int PageNumber
    },
    Metadata = {
        string Text
        List<PdfMetadata> ListOfLines
        List<PdfMetadata> ListOfWords
        string PDFURI
    },
    ImageUri = "https://{uri_image_path}"
};

Expected Results

ExtractFullText

"some text of entire page (or pages)"

ExtractWordsMetadata

[
    {
        Text = "some"
        X = 150.233
        Y = 88.45
        Width = 12.2
        Height = 11.82
        PageInfo =  {
                        PageNumber = 1,
                        BlobkId = 0
                    }
}

    {
        Text = "text"
        X = 170.233
        Y = 88.45
        Width = 12.2
        Height = 11.82
        PageInfo =  {
                        PageNumber = 1,
                        BlobkId = 1
                    }
    }
]

ExtractLinesMetada

{
    Text = "some text of entire line"
    X = 150.233
    Y = 88.45
    Width = 12.2
    Height = 11.82
    PageInfo =  {
                    PageNumber = 1,
                    BlobkId = 0
                }
}

GetIndexMetadata

{
    Text = "some text of entire line"
    ListOfLines = 
    [ 
        {
            Text = "some text of entire line"
            X = 150.233
            Y = 88.45
            Width = 12.2
            Height = 11.82
            PageInfo =  {
                            PageNumber = 2,
                            BlobkId = 12
                        }
        },
        ... 
    ]
    ListOfWords = 
    [
        {
            Text = "some"
            X = 150.233
            Y = 88.45
            Width = 12.2
            Height = 11.82
            PageInfo =  {
                            PageNumber = 1,
                            BlobkId = 0
                        }
        }

        {
            Text = "text"
            X = 170.233
            Y = 88.45
            Width = 12.2
            Height = 11.82
            PageInfo =  {
                            PageNumber = 1,
                            BlobkId = 1
                        }
        }
    ]
}

Product	Compatible and additional computed target framework versions.
.NET	net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed.
.NET Core	netcoreapp2.0 is compatible. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed.

Product

.NET

.NET Core

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

.NETCoreApp 2.0
- Autofac (>= 4.8.1)
- Ghostscript.NET (>= 1.2.1)
- itext7 (>= 7.1.2)
- Lucene.Net (>= 4.8.0-beta00005)
- SixLabors.ImageSharp (>= 1.0.0-beta0004)
- SixLabors.ImageSharp.Drawing (>= 1.0.0-beta0004)
- System.Drawing.Common (>= 4.5.0)
- WindowsAzure.Storage (>= 9.2.0)

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last updated
1.2.2	6,189	6/8/2019
1.2.1	764	4/2/2019