MariGold.HtmlParser
2.1.0
dotnet add package MariGold.HtmlParser --version 2.1.0
NuGet\Install-Package MariGold.HtmlParser -Version 2.1.0
<PackageReference Include="MariGold.HtmlParser" Version="2.1.0" />
paket add MariGold.HtmlParser --version 2.1.0
#r "nuget: MariGold.HtmlParser, 2.1.0"
// Install MariGold.HtmlParser as a Cake Addin #addin nuget:?package=MariGold.HtmlParser&version=2.1.0 // Install MariGold.HtmlParser as a Cake Tool #tool nuget:?package=MariGold.HtmlParser&version=2.1.0
MariGold.HtmlParser
MariGold.HtmlParser is a utility to parse the HTML documents into a collection of IHtmlNode type instances. You can either traverse through the document by parsing every root elements one by one or parse the entire document at once. Once an HTML element parsed, it will recursively parse all the child elements.
Installing via NuGet
In Package Manager Console, enter the following command:
Install-Package MariGold.HtmlParser
Usage
MariGold.HtmlParser can be used to parse both HTML and CSS of an HTML document.
Traverse through html elements
In the following example, the first loop iteration will parse the first div and the following div in the second and final iteration.
using MariGold.HtmlParser;
HtmlParser parser = new HtmlTextParser("<div>first</div><div>second</div>");
while (parser.Traverse())
{
IHtmlNode node = parser.Current;
}
Parse the HTML document
To parse the entire document at once, use Parse method. It will parse all the HTML elements in the given document and Current property will point to the first root element
if (parser.Parse())
{
IHtmlNode node = parser.Current;
}
Travel through IHtmlNode collection
Use Next and Previous properties to travel through the IHtmlNode collection. Use the Children property to access the descendant elements.
IHtmlNode node = parser.Current;
while (node != null)
{
node = node.Next;
}
Parse CSS styles
By default parsing HTML will not parse the CSS styles. The ParseStyles method will parse any external or inline styles in the document. The parsed styles can be accessed using the Styles and InheritedStyles properties of IHtmlNode.
HtmlParser parser = new HtmlTextParser(@"<html>
<head>
<style type='text/css'>
.cls
{
font-size:10px;
}
</style>
</head>
<body>
<div class='cls' style='font-family:Arial'>sample</div>
</body>
</html>");
if (parser.Parse())
{
await parser.ParseStylesAsync();
}
To resolve any protocol free or relative url of external style sheets, use the UriSchema and BaseURL properties.
parser.UriSchema = Uri.UriSchemeHttp;
parser.BaseURL = "http://site.com";
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
-
net6.0
- System.Net.Http (>= 4.3.4)
NuGet packages (2)
Showing the top 2 NuGet packages that depend on MariGold.HtmlParser:
Package | Downloads |
---|---|
MariGold.OpenXHTML
A utility to convert html document to Open XML word document |
|
SanJing.HTML
SanJing.HTML |
GitHub repositories
This package is not used by any popular GitHub repositories.
Upgraded to .NET 6.0