ElBruno.MarkItDotNet.CoreModel 0.6.0

dotnet add package ElBruno.MarkItDotNet.CoreModel --version 0.6.0
                    
NuGet\Install-Package ElBruno.MarkItDotNet.CoreModel -Version 0.6.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="ElBruno.MarkItDotNet.CoreModel" Version="0.6.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="ElBruno.MarkItDotNet.CoreModel" Version="0.6.0" />
                    
Directory.Packages.props
<PackageReference Include="ElBruno.MarkItDotNet.CoreModel" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add ElBruno.MarkItDotNet.CoreModel --version 0.6.0
                    
#r "nuget: ElBruno.MarkItDotNet.CoreModel, 0.6.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package ElBruno.MarkItDotNet.CoreModel@0.6.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=ElBruno.MarkItDotNet.CoreModel&version=0.6.0
                    
Install as a Cake Addin
#tool nuget:?package=ElBruno.MarkItDotNet.CoreModel&version=0.6.0
                    
Install as a Cake Tool

ElBruno.MarkItDotNet

Build License: MIT

.NET library that converts 15+ file formats to Markdown for AI pipelines, documentation workflows, and developer tools. Inspired by Python markitdown.

πŸ“¦ NuGet Packages

Package Version Downloads Description
ElBruno.MarkItDotNet NuGet Downloads Core library β€” 12 built-in converters
ElBruno.MarkItDotNet.Excel NuGet Downloads Excel (.xlsx) β†’ Markdown tables
ElBruno.MarkItDotNet.PowerPoint NuGet Downloads PowerPoint (.pptx) β†’ slides + notes
ElBruno.MarkItDotNet.AI NuGet Downloads AI-powered OCR, captioning, transcription
ElBruno.MarkItDotNet.Whisper NuGet Downloads Local audio transcription via Whisper ONNX
ElBruno.MarkItDotNet.Cli NuGet Downloads Command-line tool (markitdown command)

Description

ElBruno.MarkItDotNet provides a unified interface to convert 15+ file formats into clean, structured Markdown. The core package handles text, JSON, HTML, Word, PDF, RTF, EPUB, images, CSV, XML, YAML, and URLs (web pages). Extend with satellite packages for Excel, PowerPoint, AI-powered features (OCR, image captioning, audio transcription), and local audio transcription via Whisper. Designed for AI content pipelines, documentation systems, and any scenario where you need consistent Markdown output from mixed file sources.

Supported Formats

Format Extensions Converter Package Dependencies
Plain Text .txt, .md, .log PlainTextConverter Core None
JSON .json JsonConverter Core None
HTML .html, .htm HtmlConverter Core ReverseMarkdown
URL (Web Pages) .url UrlConverter Core ReverseMarkdown
Word (DOCX) .docx DocxConverter Core DocumentFormat.OpenXml
PDF .pdf PdfConverter Core PdfPig
CSV .csv CsvConverter Core None
XML .xml XmlConverter Core None
YAML .yaml, .yml YamlConverter Core None
RTF .rtf RtfConverter Core RtfPipe
EPUB .epub EpubConverter Core VersOne.Epub
Images .jpg, .jpeg, .png, .gif, .bmp, .webp, .svg ImageConverter Core None
Excel (XLSX) .xlsx ExcelConverter Excel ClosedXML
PowerPoint (PPTX) .pptx PowerPointConverter PowerPoint DocumentFormat.OpenXml
Images (AI-OCR) All image formats AiImageConverter AI Microsoft.Extensions.AI
Audio (AI Transcription) .mp3, .wav, .m4a, .ogg AiAudioConverter AI Microsoft.Extensions.AI
PDF (AI-OCR) .pdf AiPdfConverter AI Microsoft.Extensions.AI
Audio (Local Whisper) .wav, .mp3, .m4a, .ogg, .flac WhisperAudioConverter Whisper ElBruno.Whisper

Target Frameworks

  • .NET 8.0 (LTS)
  • .NET 10.0

πŸ› οΈ CLI Tool

Command-line interface for batch conversion and terminal workflows.

Installation

Install as a global tool:

dotnet tool install -g ElBruno.MarkItDotNet.Cli

Quick Examples

Convert a single file:

markitdown report.pdf
markitdown report.pdf -o report.md

Batch convert a directory:

markitdown batch ./documents -o ./output -r --pattern "*.pdf"

Convert a web page:

markitdown url https://example.com -o page.md

Extract metadata as JSON:

markitdown data.csv --format json | jq .metadata.wordCount

β†’ Full CLI Documentation

Packages

ElBruno.MarkItDotNet is distributed across multiple NuGet packages for flexibility:

Core Package

ElBruno.MarkItDotNet β€” The main library with 12 built-in converters.

dotnet add package ElBruno.MarkItDotNet

Includes: Plain text, JSON, HTML, URLs (web pages), Word, PDF, RTF, EPUB, images, CSV, XML, YAML.

Satellite Packages

ElBruno.MarkItDotNet.Excel β€” Excel (XLSX) to Markdown converter (v0.2.0+)

dotnet add package ElBruno.MarkItDotNet.Excel

Converts spreadsheet sheets to Markdown tables.

ElBruno.MarkItDotNet.PowerPoint β€” PowerPoint (PPTX) to Markdown converter (v0.2.0+)

dotnet add package ElBruno.MarkItDotNet.PowerPoint

Converts slides and speaker notes to Markdown.

ElBruno.MarkItDotNet.AI β€” AI-powered converters (v0.2.0+)

dotnet add package ElBruno.MarkItDotNet.AI

Requires Microsoft.Extensions.AI (for IChatClient). Provides:

  • AiImageConverter β€” OCR for images using LLM vision
  • AiPdfConverter β€” OCR for PDFs using LLM vision
  • AiAudioConverter β€” Transcription for audio files using LLM audio APIs

ElBruno.MarkItDotNet.Whisper β€” Local audio transcription via Whisper ONNX (v0.3.0+)

dotnet add package ElBruno.MarkItDotNet.Whisper

Uses ElBruno.Whisper for offline speech-to-text. No cloud API needed β€” runs locally via ONNX Runtime. Supports .wav, .mp3, .m4a, .ogg, .flac.

Installation

For the core library only:

dotnet add package ElBruno.MarkItDotNet

For Excel support:

dotnet add package ElBruno.MarkItDotNet.Excel

For PowerPoint support:

dotnet add package ElBruno.MarkItDotNet.PowerPoint

For AI-powered features (requires separate IChatClient registration):

dotnet add package ElBruno.MarkItDotNet.AI

For local audio transcription (offline, no API key needed):

dotnet add package ElBruno.MarkItDotNet.Whisper

Quick Start

The simplest way to get started is with the MarkdownConverter faΓ§ade:

using ElBruno.MarkItDotNet;

// Convert a file to Markdown
var converter = new MarkdownConverter();
var markdown = converter.ConvertToMarkdown("document.txt");
Console.WriteLine(markdown);

// Or convert from a stream
using var stream = File.OpenRead("document.pdf");
var result = await converter.ConvertAsync(stream, ".pdf");
Console.WriteLine(result.Markdown);

The MarkdownConverter class pre-registers all built-in converters (from the core package) and provides synchronous and asynchronous conversion methods.

URL Conversion

Convert web pages directly to Markdown:

var service = new MarkdownService(registry);
var result = await service.ConvertUrlAsync("https://example.com");
Console.WriteLine(result.Markdown);

The URL converter fetches the page, strips navigation/scripts/styles, extracts the title, and converts the content to clean Markdown.

With Satellite Packages

When you install satellite packages (Excel, PowerPoint, AI), converters are automatically registered during dependency injection setup. The system discovers them via the plugin system.

Dependency Injection with Plugin System

For advanced scenarios (e.g., ASP.NET Core applications), use the DI extension methods to register MarkItDotNet services:

using Microsoft.Extensions.DependencyInjection;
using ElBruno.MarkItDotNet;
using ElBruno.MarkItDotNet.Excel;
using ElBruno.MarkItDotNet.PowerPoint;

var services = new ServiceCollection();

// Register core MarkItDotNet with built-in converters
services.AddMarkItDotNet();

// Register satellite package converters (plugins)
services.AddMarkItDotNetExcel();
services.AddMarkItDotNetPowerPoint();

// Register AI converters (requires IChatClient)
// services.AddMarkItDotNetAI();

var provider = services.BuildServiceProvider();
var markdownService = provider.GetRequiredService<MarkdownService>();

// Convert files through the service (converters auto-discovered)
var result = await markdownService.ConvertAsync("document.xlsx");
if (result.Success)
{
    Console.WriteLine(result.Markdown);
}
else
{
    Console.WriteLine($"Error: {result.ErrorMessage}");
}

All registered converters (core + plugins) are automatically available through the MarkdownService.

Streaming Conversion

For large files, use the streaming API to process content chunk-by-chunk:

var converter = new MarkdownConverter();
using var stream = File.OpenRead("large-document.pdf");

await foreach (var chunk in converter.ConvertStreamingAsync(stream, ".pdf"))
{
    Console.Write(chunk);
}

The streaming API yields Markdown chunks asynchronously (e.g., page-by-page for PDFs), enabling memory-efficient processing of large files.

AI-Powered Conversion

The ElBruno.MarkItDotNet.AI package provides converters that use LLM vision and audio APIs for advanced capabilities:

Setup

using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.AI;
using ElBruno.MarkItDotNet;
using ElBruno.MarkItDotNet.AI;

var services = new ServiceCollection();

// Register a chat client (e.g., OpenAI)
services.AddOpenAIChatClient("sk-...", "gpt-4-vision");

// Register core + AI converters
services.AddMarkItDotNet();
services.AddMarkItDotNetAI();

var provider = services.BuildServiceProvider();
var markdownService = provider.GetRequiredService<MarkdownService>();

// Use AI converters transparently
var result = await markdownService.ConvertAsync("screenshot.png");
Console.WriteLine(result.Markdown);

AI Converters

  • AiImageConverter β€” Uses LLM vision to describe images and extract text
  • AiPdfConverter β€” Uses LLM vision to OCR PDFs (complements plain text extraction)
  • AiAudioConverter β€” Uses LLM audio APIs to transcribe audio files (MP3, WAV, M4A, OGG)

Configure behavior via AiOptions:

services.AddMarkItDotNetAI(options =>
{
    options.ImageDescriptionPrompt = "Describe this image in detail...";
    options.MaxRetries = 3;
});

Local Audio Transcription (Whisper)

The ElBruno.MarkItDotNet.Whisper package uses ElBruno.Whisper for offline speech-to-text powered by ONNX Runtime. No cloud API needed.

using ElBruno.Whisper;
using ElBruno.MarkItDotNet;
using ElBruno.MarkItDotNet.Whisper;

// Create Whisper client (downloads model on first run ~75MB)
using var whisperClient = await WhisperClient.CreateAsync();

// Register the plugin
var registry = new ConverterRegistry();
registry.RegisterPlugin(new WhisperConverterPlugin(whisperClient));

var service = new MarkdownService(registry);
var result = await service.ConvertAsync("recording.wav");
Console.WriteLine(result.Markdown);

Or with DI:

services.AddMarkItDotNet();
services.AddMarkItDotNetWhisper(options =>
{
    options.Model = KnownWhisperModels.WhisperBaseEn; // Optional: pick model size
});

API Reference

MarkdownService

The main service for converting files to Markdown. Use this in DI scenarios or when you need advanced control over converters.

public class MarkdownService
{
    public MarkdownService(ConverterRegistry registry);
    
    // Convert a file at the given path
    public Task<ConversionResult> ConvertAsync(string filePath);
    
    // Convert from a stream with explicit file extension
    public Task<ConversionResult> ConvertAsync(Stream stream, string fileExtension);
    
    // Stream conversion for large files
    public IAsyncEnumerable<string> ConvertStreamingAsync(Stream stream, string fileExtension);
}

ConversionResult

Represents the outcome of a file conversion. Always check Success before accessing Markdown.

public class ConversionResult
{
    public string Markdown { get; }          // Converted content (empty if failed)
    public string SourceFormat { get; }      // Source format (e.g., ".pdf")
    public bool Success { get; }             // Whether conversion succeeded
    public string? ErrorMessage { get; }     // Error details if Success is false
}

IMarkdownConverter

Contract for implementing custom converters.

public interface IMarkdownConverter
{
    // Check if this converter handles the given file extension
    bool CanHandle(string fileExtension);
    
    // Perform the conversion (extension includes the leading dot)
    Task<string> ConvertAsync(Stream fileStream, string fileExtension);
}

IStreamingMarkdownConverter

Extended contract for converters that support streaming (chunk-by-chunk processing).

public interface IStreamingMarkdownConverter : IMarkdownConverter
{
    // Converts content to Markdown, yielding chunks asynchronously
    IAsyncEnumerable<string> ConvertStreamingAsync(
        Stream fileStream,
        string fileExtension,
        CancellationToken cancellationToken = default);
}

IConverterPlugin

Contract for plugin packages that bundle one or more converters.

public interface IConverterPlugin
{
    // Human-readable name of the plugin (e.g., "Excel", "AI")
    string Name { get; }
    
    // Returns all converters provided by this plugin
    IEnumerable<IMarkdownConverter> GetConverters();
}

ConverterRegistry

Manages and resolves converters by file extension.

public class ConverterRegistry
{
    public void Register(IMarkdownConverter converter);
    public void RegisterPlugin(IConverterPlugin plugin);
    public IMarkdownConverter? Resolve(string extension);
    public IReadOnlyList<IMarkdownConverter> GetAll();
}

Custom Converters

You can implement custom converters for unsupported file formats by implementing IConverterPlugin or IMarkdownConverter:

Quick Custom Converter

Implement IMarkdownConverter for a single format:

using ElBruno.MarkItDotNet;
using System.Text;

public class CsvConverter : IMarkdownConverter
{
    public bool CanHandle(string fileExtension) =>
        fileExtension.Equals(".csv", StringComparison.OrdinalIgnoreCase);

    public async Task<string> ConvertAsync(Stream fileStream, string fileExtension)
    {
        using var reader = new StreamReader(fileStream, leaveOpen: true);
        var csv = await reader.ReadToEndAsync();
        
        var lines = csv.Split('\n');
        if (lines.Length == 0) return string.Empty;
        
        var sb = new StringBuilder();
        
        // Header row
        var headers = lines[0].Split(',');
        sb.Append("| ");
        sb.Append(string.Join(" | ", headers));
        sb.AppendLine(" |");
        sb.Append("|");
        sb.Append(string.Concat(headers.Select(_ => " --- |")));
        sb.AppendLine();
        
        // Data rows
        for (int i = 1; i < lines.Length; i++)
        {
            if (string.IsNullOrWhiteSpace(lines[i])) continue;
            var cells = lines[i].Split(',');
            sb.Append("| ");
            sb.Append(string.Join(" | ", cells));
            sb.AppendLine(" |");
        }
        
        return sb.ToString();
    }
}

Register with DI:

services.AddMarkItDotNet();
var registry = provider.GetRequiredService<ConverterRegistry>();
registry.Register(new CsvConverter());

Satellite Plugin Package

For reusable plugins, implement IConverterPlugin:

using ElBruno.MarkItDotNet;

public class MyCustomPlugin : IConverterPlugin
{
    public string Name => "MyCustom";

    public IEnumerable<IMarkdownConverter> GetConverters() =>
    [
        new MyFormatConverter1(),
        new MyFormatConverter2()
    ];
}

Register in DI:

services.AddSingleton<IConverterPlugin>(new MyCustomPlugin());

The registry automatically discovers and loads all registered plugins.

πŸ“¦ Samples

See Samples Guide for detailed walkthroughs.

Simple Samples

Sample Description Run Command
BasicConversion Text, JSON, and HTML conversion with DI dotnet run --project src/samples/BasicConversion/BasicConversion.csproj
CsvConversion CSV and TSV β†’ Markdown tables dotnet run --project src/samples/CsvConversion/CsvConversion.csproj
XmlYamlConversion XML and YAML β†’ fenced code blocks dotnet run --project src/samples/XmlYamlConversion/XmlYamlConversion.csproj
PdfConversion PDF β†’ Markdown with page metadata + streaming dotnet run --project src/samples/PdfConversion/PdfConversion.csproj
DocxConversion DOCX β†’ Markdown with headings, tables, links dotnet run --project src/samples/DocxConversion/DocxConversion.csproj
RtfEpubConversion RTF and EPUB β†’ Markdown dotnet run --project src/samples/RtfEpubConversion/RtfEpubConversion.csproj
ExcelConversion Excel .xlsx β†’ Markdown tables (Excel package) dotnet run --project src/samples/ExcelConversion/ExcelConversion.csproj
PowerPointConversion PPTX slides + notes β†’ Markdown (PowerPoint package) dotnet run --project src/samples/PowerPointConversion/PowerPointConversion.csproj
AiImageDescription Image OCR/captioning via IChatClient (AI package) dotnet run --project src/samples/AiImageDescription/AiImageDescription.csproj
StreamingConversion IAsyncEnumerable streaming for large PDFs dotnet run --project src/samples/StreamingConversion/StreamingConversion.csproj
CustomConverter Build a custom IMarkdownConverter (.ini files) dotnet run --project src/samples/CustomConverter/CustomConverter.csproj
PluginPackage Build and register a custom IConverterPlugin dotnet run --project src/samples/PluginPackage/PluginPackage.csproj
AllFormats Converts all supported formats in one app dotnet run --project src/samples/AllFormats/AllFormats.csproj
UrlConversion Web page URL β†’ Markdown dotnet run --project src/samples/UrlConversion/UrlConversion.csproj
WhisperTranscription Local audio transcription via Whisper ONNX dotnet run --project src/samples/WhisperTranscription/WhisperTranscription.csproj

End-to-End Samples

Sample Description Run Command
MarkItDotNet.WebApi ASP.NET Core Minimal API with file upload + streaming dotnet run --project src/samples/MarkItDotNet.WebApi/MarkItDotNet.WebApi.csproj
BatchProcessor Watches folder and batch-converts files to .md dotnet run --project src/samples/BatchProcessor/BatchProcessor.csproj
RagPipeline RAG ingestion: files β†’ Markdown β†’ chunked JSON dotnet run --project src/samples/RagPipeline/RagPipeline.csproj

Documentation

🀝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Security

ElBruno.MarkItDotNet processes untrusted file content and includes built-in security protections:

  • SSRF Protection β€” URL converter blocks private/internal IP addresses
  • File Size Limits β€” Configurable maximum file size (default 100MB)
  • XXE Prevention β€” XML parser explicitly prohibits DTD processing
  • Prompt Injection Mitigation β€” AI converters use system/user message separation

For detailed security guidance, see docs/security.md.

To report a security vulnerability, please use GitHub Security Advisories.

πŸ“„ License

This project is licensed under the MIT License β€” see the LICENSE file for details.

πŸ‘‹ About the Author

Made with ❀️ by Bruno Capuano (ElBruno)

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
  • net8.0

    • No dependencies.

NuGet packages (9)

Showing the top 5 NuGet packages that depend on ElBruno.MarkItDotNet.CoreModel:

Package Downloads
ElBruno.MarkItDotNet

.NET library that converts file formats to Markdown for AI pipelines, documentation workflows, and developer tools. Inspired by Python markitdown.

ElBruno.MarkItDotNet.AzureSearch

Azure AI Search integration for ElBruno.MarkItDotNet ingestion pipelines. Provides index schema, document mapping, and batch upload helpers.

ElBruno.MarkItDotNet.Citations

Citation tracking and source traceability for ElBruno.MarkItDotNet ingestion pipelines. Provides page, block, and heading-path references for chunks.

ElBruno.MarkItDotNet.VectorData

Provider-neutral vector record mapping and export for ElBruno.MarkItDotNet ingestion pipelines. Supports Microsoft.Extensions.VectorData abstractions and JSONL export.

ElBruno.MarkItDotNet.DocumentIntelligence

Azure Document Intelligence integration for ElBruno.MarkItDotNet. Provides layout-aware OCR extraction with structured CoreModel output.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.6.0 0 4/15/2026
0.5.3 172 4/13/2026