OpenScraping 1.2.0

Turn unstructured HTML pages into structured data. The OpenScraping library can extract information from HTML pages using a JSON config file with xPath rules. It can scrape even multi-level complex objects such as tables and forum posts.

There is a newer version of this package available.
See the version list below for details.
Install-Package OpenScraping -Version 1.2.0
dotnet add package OpenScraping --version 1.2.0
<PackageReference Include="OpenScraping" Version="1.2.0" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add OpenScraping --version 1.2.0
The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Release Notes

- Added support for _removeXPaths config key, which allows removing some child nodes based on xPath rules BEFORE we process the main parent node. Useful for example when extracting a news article that contains divs that we want to remove BEFORE we extract the body of an article.
- Updated some dependencies like HtmlAgilityPack to the latest stable version.
- Simplified reading configs and added support for specifying some config keys as either singular or plural, and setting their values to either a single value or an array.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version History

Version Downloads Last updated
1.4.2 9,014 12/27/2018
1.4.1 275 12/27/2018
1.4.0 669 12/5/2018
1.3.0 321 11/24/2018
1.2.0 322 11/24/2018
1.1.0 1,034 8/23/2018
1.0.1 948 8/20/2017
0.0.5 22,085 1/17/2016