Mythosia.Documents.Abstractions
1.1.0
dotnet add package Mythosia.Documents.Abstractions --version 1.1.0
NuGet\Install-Package Mythosia.Documents.Abstractions -Version 1.1.0
<PackageReference Include="Mythosia.Documents.Abstractions" Version="1.1.0" />
<PackageVersion Include="Mythosia.Documents.Abstractions" Version="1.1.0" />
<PackageReference Include="Mythosia.Documents.Abstractions" />
paket add Mythosia.Documents.Abstractions --version 1.1.0
#r "nuget: Mythosia.Documents.Abstractions, 1.1.0"
#:package Mythosia.Documents.Abstractions@1.1.0
#addin nuget:?package=Mythosia.Documents.Abstractions&version=1.1.0
#tool nuget:?package=Mythosia.Documents.Abstractions&version=1.1.0
Mythosia.Documents.Abstractions
Core document abstractions for structured document loading and parsing. Framework-agnostic — usable with any RAG pipeline or document processing system.
Installation
dotnet add package Mythosia.Documents.Abstractions
Key Types
DoclingDocument
Unified structured document representation following the docling convention. Content items are stored in flat lists; the tree structure is maintained via body/furniture root nodes.
using Mythosia.Documents;
var doc = new DoclingDocument
{
Name = "report",
Source = "docs/report.pdf",
};
// Builder API
doc.AddTitle("Annual Report");
doc.AddHeading("Revenue", level: 2);
doc.AddParagraph("Total revenue increased by 15%.");
doc.AddCode("var x = 42;", language: "csharp");
// Export to Markdown
string markdown = doc.ToMarkdown();
// Optional: override table rendering strategy
doc.TableSerializer = new SemanticTableSerializer();
string semanticMarkdown = doc.ToMarkdown();
For plain-text content that should be preserved as-is, use RawContent:
var doc = new DoclingDocument
{
Name = "notes",
Source = "notes.txt",
RawContent = rawText, // ToMarkdown() returns this directly
};
Table Serialization
Table rendering is pluggable via ITableSerializer. The default is GridTableSerializer (standard Markdown pipe table). Switch to SemanticTableSerializer for form-style documents:
using Mythosia.Documents.Elements;
// Default: pipe table
var doc = new DoclingDocument { Name = "report" };
string md = doc.ToMarkdown(); // uses GridTableSerializer
// Semantic: bold group labels for form-style tables
doc.TableSerializer = new SemanticTableSerializer();
string md2 = doc.ToMarkdown(); // uses SemanticTableSerializer
| Serializer | Output Style |
|---|---|
GridTableSerializer |
Standard Markdown pipe table (default) |
SemanticTableSerializer |
Form-style with **bold labels** and inline data |
IDocumentLoader
public interface IDocumentLoader
{
Task<IReadOnlyList<DoclingDocument>> LoadAsync(
string source, CancellationToken cancellationToken = default);
}
IDocumentParser
public interface IDocumentParser
{
bool CanParse(string source);
Task<DoclingDocument> ParseAsync(string source, CancellationToken ct = default);
}
Element Types (Mythosia.Documents.Elements)
| Type | Description |
|---|---|
TextItem |
Paragraph, generic text |
TitleItem |
Document title |
SectionHeaderItem |
Section heading (H1–H6) |
CodeItem |
Code block with language |
DocListItem |
List item (ordered/unordered) |
TableItem / TableData / TableCell |
Table structure |
TableSemanticView |
Semantic group/column analysis for table layout |
PictureItem |
Image placeholder |
GroupItem |
Container (chapter, slide, sheet) |
Related Packages
| Package | Description |
|---|---|
| Mythosia.Documents.Hwp | HWP (Korean word processor) loader |
| Mythosia.Documents.Office | Word / Excel / PowerPoint loaders |
| Mythosia.Documents.Pdf | PDF loader (PdfPig) |
| Mythosia.AI.Rag | RAG pipeline that consumes DoclingDocument |
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
| .NET Core | netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
| .NET Standard | netstandard2.1 is compatible. |
| MonoAndroid | monoandroid was computed. |
| MonoMac | monomac was computed. |
| MonoTouch | monotouch was computed. |
| Tizen | tizen60 was computed. |
| Xamarin.iOS | xamarinios was computed. |
| Xamarin.Mac | xamarinmac was computed. |
| Xamarin.TVOS | xamarintvos was computed. |
| Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETStandard 2.1
- System.Text.Json (>= 10.0.5)
NuGet packages (3)
Showing the top 3 NuGet packages that depend on Mythosia.Documents.Abstractions:
| Package | Downloads |
|---|---|
|
Mythosia.Documents.Pdf
PDF document loader. Parses PDF files into DoclingDocument structured models via PdfPig. Font-size based heading detection, bullet/numbered list recognition, spatial paragraph grouping. Supports encrypted PDFs, metadata extraction, and page number headers. |
|
|
Mythosia.Documents.Office
Office document loaders for Word (.docx), Excel (.xlsx), and PowerPoint (.pptx). Parses documents into DoclingDocument structured models via OpenXml. |
|
|
Mythosia.Documents.Hwp
HWP document loader. Parses Korean word-processor (.hwp) files into DoclingDocument structured models via HwpLibSharp. Section/paragraph text extraction with table support. |
GitHub repositories
This package is not used by any popular GitHub repositories.
v1.1.0: Added pluggable table serialization. ITableSerializer strategy interface with GridTableSerializer (pipe table, default) and SemanticTableSerializer (form-style group rendering with bold labels). DoclingDocument.TableSerializer property allows per-document override. TableData and TableSemanticView for structural table analysis.
v1.0.0: Initial release as Mythosia.Documents.Abstractions. DoclingDocument structured model with body tree, RawContent bypass, Metadata, Builder API, and Markdown export. IDocumentLoader returns DoclingDocument. Element types in Mythosia.Documents.Elements namespace.