Turn unstructured HTML pages into structured data. The OpenScraping library can extract information from HTML pages using a JSON config file with xPath rules. It can scrape even multi-level complex objects such as tables and forum posts.
See the version list below for details.
Install-Package OpenScraping -Version 1.2.0
dotnet add package OpenScraping --version 1.2.0
<PackageReference Include="OpenScraping" Version="1.2.0" />
paket add OpenScraping --version 1.2.0
- Added support for _removeXPaths config key, which allows removing some child nodes based on xPath rules BEFORE we process the main parent node. Useful for example when extracting a news article that contains divs that we want to remove BEFORE we extract the body of an article.
- Updated some dependencies like HtmlAgilityPack to the latest stable version.
- Simplified reading configs and added support for specifying some config keys as either singular or plural, and setting their values to either a single value or an array.
This package is not used by any NuGet packages.
This package is not used by any popular GitHub repositories.