OfficeIMO.Markdown.Html
0.1.8
Prefix Reserved
dotnet add package OfficeIMO.Markdown.Html --version 0.1.8
NuGet\Install-Package OfficeIMO.Markdown.Html -Version 0.1.8
<PackageReference Include="OfficeIMO.Markdown.Html" Version="0.1.8" />
<PackageVersion Include="OfficeIMO.Markdown.Html" Version="0.1.8" />
<PackageReference Include="OfficeIMO.Markdown.Html" />
paket add OfficeIMO.Markdown.Html --version 0.1.8
#r "nuget: OfficeIMO.Markdown.Html, 0.1.8"
#:package OfficeIMO.Markdown.Html@0.1.8
#addin nuget:?package=OfficeIMO.Markdown.Html&version=0.1.8
#tool nuget:?package=OfficeIMO.Markdown.Html&version=0.1.8
OfficeIMO.Markdown.Html
HTML to Markdown conversion for OfficeIMO.Markdown.
OfficeIMO.Markdown.Html is the HTML ingestion layer for the OfficeIMO markdown stack. It converts HTML fragments or full documents into:
- Markdown text
MarkdownDocblock models fromOfficeIMO.Markdown
The goal is not just "good looking output", but a structural conversion that keeps as much meaningful ordering and block shape as the current markdown AST allows.
Design goals
- Convert HTML into a real
MarkdownDocfirst, then render Markdown text from that model. - Preserve block ordering whenever HTML mixes paragraphs with quotes, nested lists, details blocks, and other supported structures.
- Resolve links and images consistently when a base URI is supplied.
- Preserve unsupported HTML explicitly when requested instead of silently flattening everything away.
Current conversion model
Supported block-level mappings include:
- headings
- paragraphs
- ordered and unordered lists
- block quotes
- fenced code blocks from
pre/code - horizontal rules
- tables
- images and figures
- details / summary
- definition lists
- shared
data-omd-*visual host elements back into semantic fenced blocks - raw HTML fallback blocks for unsupported elements
Supported inline mappings include:
- emphasis and strong emphasis
- strike-through
- code spans
- links
- images
- hard line breaks
- typed inline HTML wrappers for
q,u,ins,sub, andsup - a conservative raw/passthrough fallback for unsupported inline HTML when preservation is enabled
Usage
Convert to Markdown text
using OfficeIMO.Markdown;
using OfficeIMO.Markdown.Html;
var markdown = "<h1>Hello</h1><p>Body</p>".ToMarkdown();
var document = "<h1>Hello</h1><p>Body</p>".LoadFromHtml();
Convert with options
using OfficeIMO.Markdown.Html;
var options = new HtmlToMarkdownOptions {
BaseUri = new Uri("https://example.com/docs/"),
UseBodyContentsOnly = true,
PreserveUnsupportedBlocks = true,
PreserveUnsupportedInlineHtml = true
};
string markdown = "<p><a href=\"guide/start\">Docs</a></p>".ToMarkdown(options);
Convert with portable markdown output
using OfficeIMO.Markdown.Html;
var options = HtmlToMarkdownOptions.CreatePortableProfile();
string markdown = """
<blockquote>
<p><strong>Example</strong></p>
<p>Body text</p>
</blockquote>
""".ToMarkdown(options);
Use the portable profile when HTML ingestion should produce generic markdown output instead of OfficeIMO-specific block syntax.
Convert to MarkdownDoc, then choose the markdown writer profile explicitly
using OfficeIMO.Markdown;
using OfficeIMO.Markdown.Html;
var converter = new HtmlToMarkdownConverter();
var document = converter.ConvertToDocument("""
<table>
<tr><th>Name</th><th>Notes</th></tr>
<tr><td>Alice</td><td><p>Line one</p><blockquote><p>Line two</p></blockquote></td></tr>
</table>
""");
var officeMarkdown = document.ToMarkdown(MarkdownWriteOptions.CreateOfficeIMOProfile());
var portableMarkdown = document.ToMarkdown(MarkdownWriteOptions.CreatePortableProfile());
This is the cleanest path when HTML ingestion fidelity matters first and the markdown serialization contract is a separate downstream decision.
Use the converter directly
using OfficeIMO.Markdown.Html;
var converter = new HtmlToMarkdownConverter();
var document = converter.ConvertToDocument("<article><h1>Hello</h1><p>Body</p></article>");
Options
BaseUriResolves relative link and image targets against a document base.UseBodyContentsOnlyUses<body>content when present instead of converting the whole HTML document node tree.RemoveScriptsAndStylesDropsscript,style,noscript, andtemplate.PreserveUnsupportedBlocksEmits unsupported block elements asHtmlRawBlockinstead of dropping them.PreserveUnsupportedInlineHtmlEmits unsupported inline elements as raw HTML instead of flattening them to plain text only.MarkdownWriteOptionsControls how the intermediateMarkdownDocis serialized back to markdown text. UseHtmlToMarkdownOptions.CreatePortableProfile()when portability matters more than preserving OfficeIMO-style output.VisualElementRoundTripHintsOrdered hint list for hosts/plugins that want to reinterpret shareddata-omd-*visual elements into richerSemanticFencedBlocknodes during HTML ingestion. When the host also referencesOfficeIMO.MarkdownRenderer, preferhtmlOptions.ApplyPlugin(...)orhtmlOptions.ApplyFeaturePack(...)so plugin-carried hint registration stays idempotent and aligned with the renderer contract.DocumentTransformsOrdered post-conversion AST transforms for hosts/plugins that want HTML ingestion to normalize or upgrade the recoveredMarkdownDocbefore markdown writing. When the host also referencesOfficeIMO.MarkdownRenderer,htmlOptions.ApplyPlugin(...)andhtmlOptions.ApplyFeaturePack(...)now carry plugin-owned document transforms too.ElementBlockConvertersOrdered custom HTML element decoders that run before the base converter falls back to generic block handling. Use these when a host/plugin package needs to recover semantic markdown blocks from vendor-specific HTML that never used the shareddata-omd-*visual contract. When the host also referencesOfficeIMO.MarkdownRenderer,htmlOptions.ApplyPlugin(...)andhtmlOptions.ApplyFeaturePack(...)now carry plugin-owned element converters too.InlineElementConvertersOrdered custom HTML inline decoders that run before the base converter falls back to generic inline handling. Use these when a host/plugin package needs to recover richer inline AST, such as vendor badges or semantic spans, instead of preserving raw HTML or flattening to plain text. When the host also referencesOfficeIMO.MarkdownRenderer,htmlOptions.ApplyPlugin(...)andhtmlOptions.ApplyFeaturePack(...)now carry plugin-owned inline converters too.
Profile guidance
CreateOfficeIMOProfile()Best when the downstream consumer isOfficeIMO.Markdown/OfficeIMO.MarkdownRendererand can benefit from richer OfficeIMO block syntax.CreatePortableProfile()Best when the downstream consumer is a generic markdown engine, HTML reconversion flow, or another parser that should not depend on OfficeIMO-only syntax.
The important split is:
HtmlToMarkdownOptionsControls HTML ingestion behavior and preservation choices.MarkdownWriteOptionsControls how the intermediate AST is written back to markdown text.
That means OfficeIMO.Markdown.Html is no longer just a text flattener. It is an HTML-to-AST bridge with a configurable markdown writer on the output side.
Structural notes
- Mixed block order inside list items is preserved.
- Multiple
ddvalues for the samedtare preserved. - Multiple
dtterms sharing the sameddgroup are preserved. - Block-rich
ddvalues are preserved as typed block content instead of being forced through inline-only conversion. - Table cells preserve typed block content in the intermediate
MarkdownDocAST instead of collapsing immediately to strings. - Supported inline HTML such as
q,u,ins,sub, andsupis preserved as typed AST wrappers instead of being flattened to plain text. - Unsupported custom/container elements are treated as block-level content when they are structurally block-like or when raw block preservation is enabled.
- Shared renderer visual hosts that carry the
data-omd-*contract are decoded back intoSemanticFencedBlocknodes, which letsOfficeIMO.MarkdownRendererHTML round-trip into semantic markdown fences. - Shared renderer visual hosts now preserve explicit fence metadata such as
#id, extra.classvalues,title="...", and plugin-defined flags throughdata-omd-fence-*attributes, so HTML round-trips can rebuild richer semantic fence info strings instead of dropping back to language-only fences. - Shared visual hosts wrapped as richer HTML, such as
<figure>with a<figcaption>, now preserve that caption on the recovered semantic fenced block instead of dropping it. - Host/plugin packages can register custom
ElementBlockConverterswhen richer vendor HTML should decode into semantic markdown blocks before generic fallback or raw-HTML preservation. - Host/plugin packages can register custom
InlineElementConverterswhen richer vendor inline HTML should decode into semantic inline AST before generic fallback or raw-HTML preservation. - Host/plugin packages can register
VisualElementRoundTripHintswhen they need to recover extra semantic details from shared visual host HTML without hard-coding vendor logic into the base converter. - Host/plugin packages can also register
DocumentTransformswhen recovered HTML should be normalized or upgraded into richer AST shapes after parsing. - When those hosts already use
OfficeIMO.MarkdownRendererplugins or feature packs, they can apply the same contract directly onHtmlToMarkdownOptionsinstead of copying converter, transform, and hint lists by hand. - Conversion happens through the
OfficeIMO.MarkdownAST, so the effective fidelity is bounded by that model.
For the current stack, this means HTML ingestion can preserve more structure than plain markdown text can always express directly. The AST is the source of truth; markdown emission is the profile-driven projection of that model.
Current limitations
- Markdown text emission is still constrained by markdown syntax itself, so rich table-cell and definition-list AST content may be flattened when serialized for engines that only accept plain markdown text.
- Downstream converters may still choose deliberate degradations for AST-preserved HTML wrappers when the target format has no native equivalent. For example, the Word converter keeps
u/sub/supstructurally but intentionally degradesinsandq. - Portable output intentionally degrades OfficeIMO-specific constructs instead of preserving host-specific syntax.
- Unsupported HTML is preserved best when
PreserveUnsupportedBlocks/PreserveUnsupportedInlineHtmlare enabled.
Related packages
OfficeIMO.MarkdownCore markdown AST, reader, and writer.OfficeIMO.Reader.HtmlHTML ingestion and chunking built on top of this converter.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
| .NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
| .NET Standard | netstandard2.0 is compatible. netstandard2.1 was computed. |
| .NET Framework | net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 is compatible. net48 was computed. net481 was computed. |
| MonoAndroid | monoandroid was computed. |
| MonoMac | monomac was computed. |
| MonoTouch | monotouch was computed. |
| Tizen | tizen40 was computed. tizen60 was computed. |
| Xamarin.iOS | xamarinios was computed. |
| Xamarin.Mac | xamarinmac was computed. |
| Xamarin.TVOS | xamarintvos was computed. |
| Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETFramework 4.7.2
- AngleSharp (>= 1.3.0)
- OfficeIMO.Markdown (>= 0.6.8)
-
.NETStandard 2.0
- AngleSharp (>= 1.3.0)
- OfficeIMO.Markdown (>= 0.6.8)
-
net10.0
- AngleSharp (>= 1.3.0)
- OfficeIMO.Markdown (>= 0.6.8)
-
net8.0
- AngleSharp (>= 1.3.0)
- OfficeIMO.Markdown (>= 0.6.8)
NuGet packages (2)
Showing the top 2 NuGet packages that depend on OfficeIMO.Markdown.Html:
| Package | Downloads |
|---|---|
|
OfficeIMO.Word.Markdown
Markdown converter for OfficeIMO.Word - Convert Word documents to/from Markdown using OfficeIMO.Markdown |
|
|
OfficeIMO.MarkdownRenderer
WebView-friendly Markdown rendering helpers (shell + incremental updates) built on OfficeIMO.Markdown. |
GitHub repositories
This package is not used by any popular GitHub repositories.