ElBruno.MarkItDotNet.AI
0.5.3
dotnet add package ElBruno.MarkItDotNet.AI --version 0.5.3
NuGet\Install-Package ElBruno.MarkItDotNet.AI -Version 0.5.3
<PackageReference Include="ElBruno.MarkItDotNet.AI" Version="0.5.3" />
<PackageVersion Include="ElBruno.MarkItDotNet.AI" Version="0.5.3" />
<PackageReference Include="ElBruno.MarkItDotNet.AI" />
paket add ElBruno.MarkItDotNet.AI --version 0.5.3
#r "nuget: ElBruno.MarkItDotNet.AI, 0.5.3"
#:package ElBruno.MarkItDotNet.AI@0.5.3
#addin nuget:?package=ElBruno.MarkItDotNet.AI&version=0.5.3
#tool nuget:?package=ElBruno.MarkItDotNet.AI&version=0.5.3
ElBruno.MarkItDotNet
.NET library that converts 15+ file formats to Markdown for AI pipelines, documentation workflows, and developer tools. Inspired by Python markitdown.
π¦ NuGet Packages
| Package | Version | Downloads | Description |
|---|---|---|---|
| ElBruno.MarkItDotNet | Core library β 12 built-in converters | ||
| ElBruno.MarkItDotNet.Excel | Excel (.xlsx) β Markdown tables | ||
| ElBruno.MarkItDotNet.PowerPoint | PowerPoint (.pptx) β slides + notes | ||
| ElBruno.MarkItDotNet.AI | AI-powered OCR, captioning, transcription | ||
| ElBruno.MarkItDotNet.Whisper | Local audio transcription via Whisper ONNX | ||
| ElBruno.MarkItDotNet.Cli | Command-line tool (markitdown command) |
Description
ElBruno.MarkItDotNet provides a unified interface to convert 15+ file formats into clean, structured Markdown. The core package handles text, JSON, HTML, Word, PDF, RTF, EPUB, images, CSV, XML, YAML, and URLs (web pages). Extend with satellite packages for Excel, PowerPoint, AI-powered features (OCR, image captioning, audio transcription), and local audio transcription via Whisper. Designed for AI content pipelines, documentation systems, and any scenario where you need consistent Markdown output from mixed file sources.
Supported Formats
| Format | Extensions | Converter | Package | Dependencies |
|---|---|---|---|---|
| Plain Text | .txt, .md, .log |
PlainTextConverter |
Core | None |
| JSON | .json |
JsonConverter |
Core | None |
| HTML | .html, .htm |
HtmlConverter |
Core | ReverseMarkdown |
| URL (Web Pages) | .url |
UrlConverter |
Core | ReverseMarkdown |
| Word (DOCX) | .docx |
DocxConverter |
Core | DocumentFormat.OpenXml |
.pdf |
PdfConverter |
Core | PdfPig |
|
| CSV | .csv |
CsvConverter |
Core | None |
| XML | .xml |
XmlConverter |
Core | None |
| YAML | .yaml, .yml |
YamlConverter |
Core | None |
| RTF | .rtf |
RtfConverter |
Core | RtfPipe |
| EPUB | .epub |
EpubConverter |
Core | VersOne.Epub |
| Images | .jpg, .jpeg, .png, .gif, .bmp, .webp, .svg |
ImageConverter |
Core | None |
| Excel (XLSX) | .xlsx |
ExcelConverter |
Excel | ClosedXML |
| PowerPoint (PPTX) | .pptx |
PowerPointConverter |
PowerPoint | DocumentFormat.OpenXml |
| Images (AI-OCR) | All image formats | AiImageConverter |
AI | Microsoft.Extensions.AI |
| Audio (AI Transcription) | .mp3, .wav, .m4a, .ogg |
AiAudioConverter |
AI | Microsoft.Extensions.AI |
| PDF (AI-OCR) | .pdf |
AiPdfConverter |
AI | Microsoft.Extensions.AI |
| Audio (Local Whisper) | .wav, .mp3, .m4a, .ogg, .flac |
WhisperAudioConverter |
Whisper | ElBruno.Whisper |
Target Frameworks
- .NET 8.0 (LTS)
- .NET 10.0
π οΈ CLI Tool
Command-line interface for batch conversion and terminal workflows.
Installation
Install as a global tool:
dotnet tool install -g ElBruno.MarkItDotNet.Cli
Quick Examples
Convert a single file:
markitdown report.pdf
markitdown report.pdf -o report.md
Batch convert a directory:
markitdown batch ./documents -o ./output -r --pattern "*.pdf"
Convert a web page:
markitdown url https://example.com -o page.md
Extract metadata as JSON:
markitdown data.csv --format json | jq .metadata.wordCount
Packages
ElBruno.MarkItDotNet is distributed across multiple NuGet packages for flexibility:
Core Package
ElBruno.MarkItDotNet β The main library with 12 built-in converters.
dotnet add package ElBruno.MarkItDotNet
Includes: Plain text, JSON, HTML, URLs (web pages), Word, PDF, RTF, EPUB, images, CSV, XML, YAML.
Satellite Packages
ElBruno.MarkItDotNet.Excel β Excel (XLSX) to Markdown converter (v0.2.0+)
dotnet add package ElBruno.MarkItDotNet.Excel
Converts spreadsheet sheets to Markdown tables.
ElBruno.MarkItDotNet.PowerPoint β PowerPoint (PPTX) to Markdown converter (v0.2.0+)
dotnet add package ElBruno.MarkItDotNet.PowerPoint
Converts slides and speaker notes to Markdown.
ElBruno.MarkItDotNet.AI β AI-powered converters (v0.2.0+)
dotnet add package ElBruno.MarkItDotNet.AI
Requires Microsoft.Extensions.AI (for IChatClient). Provides:
- AiImageConverter β OCR for images using LLM vision
- AiPdfConverter β OCR for PDFs using LLM vision
- AiAudioConverter β Transcription for audio files using LLM audio APIs
ElBruno.MarkItDotNet.Whisper β Local audio transcription via Whisper ONNX (v0.3.0+)
dotnet add package ElBruno.MarkItDotNet.Whisper
Uses ElBruno.Whisper for offline speech-to-text. No cloud API needed β runs locally via ONNX Runtime. Supports .wav, .mp3, .m4a, .ogg, .flac.
Installation
For the core library only:
dotnet add package ElBruno.MarkItDotNet
For Excel support:
dotnet add package ElBruno.MarkItDotNet.Excel
For PowerPoint support:
dotnet add package ElBruno.MarkItDotNet.PowerPoint
For AI-powered features (requires separate IChatClient registration):
dotnet add package ElBruno.MarkItDotNet.AI
For local audio transcription (offline, no API key needed):
dotnet add package ElBruno.MarkItDotNet.Whisper
Quick Start
The simplest way to get started is with the MarkdownConverter faΓ§ade:
using ElBruno.MarkItDotNet;
// Convert a file to Markdown
var converter = new MarkdownConverter();
var markdown = converter.ConvertToMarkdown("document.txt");
Console.WriteLine(markdown);
// Or convert from a stream
using var stream = File.OpenRead("document.pdf");
var result = await converter.ConvertAsync(stream, ".pdf");
Console.WriteLine(result.Markdown);
The MarkdownConverter class pre-registers all built-in converters (from the core package) and provides synchronous and asynchronous conversion methods.
URL Conversion
Convert web pages directly to Markdown:
var service = new MarkdownService(registry);
var result = await service.ConvertUrlAsync("https://example.com");
Console.WriteLine(result.Markdown);
The URL converter fetches the page, strips navigation/scripts/styles, extracts the title, and converts the content to clean Markdown.
With Satellite Packages
When you install satellite packages (Excel, PowerPoint, AI), converters are automatically registered during dependency injection setup. The system discovers them via the plugin system.
Dependency Injection with Plugin System
For advanced scenarios (e.g., ASP.NET Core applications), use the DI extension methods to register MarkItDotNet services:
using Microsoft.Extensions.DependencyInjection;
using ElBruno.MarkItDotNet;
using ElBruno.MarkItDotNet.Excel;
using ElBruno.MarkItDotNet.PowerPoint;
var services = new ServiceCollection();
// Register core MarkItDotNet with built-in converters
services.AddMarkItDotNet();
// Register satellite package converters (plugins)
services.AddMarkItDotNetExcel();
services.AddMarkItDotNetPowerPoint();
// Register AI converters (requires IChatClient)
// services.AddMarkItDotNetAI();
var provider = services.BuildServiceProvider();
var markdownService = provider.GetRequiredService<MarkdownService>();
// Convert files through the service (converters auto-discovered)
var result = await markdownService.ConvertAsync("document.xlsx");
if (result.Success)
{
Console.WriteLine(result.Markdown);
}
else
{
Console.WriteLine($"Error: {result.ErrorMessage}");
}
All registered converters (core + plugins) are automatically available through the MarkdownService.
Streaming Conversion
For large files, use the streaming API to process content chunk-by-chunk:
var converter = new MarkdownConverter();
using var stream = File.OpenRead("large-document.pdf");
await foreach (var chunk in converter.ConvertStreamingAsync(stream, ".pdf"))
{
Console.Write(chunk);
}
The streaming API yields Markdown chunks asynchronously (e.g., page-by-page for PDFs), enabling memory-efficient processing of large files.
AI-Powered Conversion
The ElBruno.MarkItDotNet.AI package provides converters that use LLM vision and audio APIs for advanced capabilities:
Setup
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.AI;
using ElBruno.MarkItDotNet;
using ElBruno.MarkItDotNet.AI;
var services = new ServiceCollection();
// Register a chat client (e.g., OpenAI)
services.AddOpenAIChatClient("sk-...", "gpt-4-vision");
// Register core + AI converters
services.AddMarkItDotNet();
services.AddMarkItDotNetAI();
var provider = services.BuildServiceProvider();
var markdownService = provider.GetRequiredService<MarkdownService>();
// Use AI converters transparently
var result = await markdownService.ConvertAsync("screenshot.png");
Console.WriteLine(result.Markdown);
AI Converters
- AiImageConverter β Uses LLM vision to describe images and extract text
- AiPdfConverter β Uses LLM vision to OCR PDFs (complements plain text extraction)
- AiAudioConverter β Uses LLM audio APIs to transcribe audio files (MP3, WAV, M4A, OGG)
Configure behavior via AiOptions:
services.AddMarkItDotNetAI(options =>
{
options.ImageDescriptionPrompt = "Describe this image in detail...";
options.MaxRetries = 3;
});
Local Audio Transcription (Whisper)
The ElBruno.MarkItDotNet.Whisper package uses ElBruno.Whisper for offline speech-to-text powered by ONNX Runtime. No cloud API needed.
using ElBruno.Whisper;
using ElBruno.MarkItDotNet;
using ElBruno.MarkItDotNet.Whisper;
// Create Whisper client (downloads model on first run ~75MB)
using var whisperClient = await WhisperClient.CreateAsync();
// Register the plugin
var registry = new ConverterRegistry();
registry.RegisterPlugin(new WhisperConverterPlugin(whisperClient));
var service = new MarkdownService(registry);
var result = await service.ConvertAsync("recording.wav");
Console.WriteLine(result.Markdown);
Or with DI:
services.AddMarkItDotNet();
services.AddMarkItDotNetWhisper(options =>
{
options.Model = KnownWhisperModels.WhisperBaseEn; // Optional: pick model size
});
API Reference
MarkdownService
The main service for converting files to Markdown. Use this in DI scenarios or when you need advanced control over converters.
public class MarkdownService
{
public MarkdownService(ConverterRegistry registry);
// Convert a file at the given path
public Task<ConversionResult> ConvertAsync(string filePath);
// Convert from a stream with explicit file extension
public Task<ConversionResult> ConvertAsync(Stream stream, string fileExtension);
// Stream conversion for large files
public IAsyncEnumerable<string> ConvertStreamingAsync(Stream stream, string fileExtension);
}
ConversionResult
Represents the outcome of a file conversion. Always check Success before accessing Markdown.
public class ConversionResult
{
public string Markdown { get; } // Converted content (empty if failed)
public string SourceFormat { get; } // Source format (e.g., ".pdf")
public bool Success { get; } // Whether conversion succeeded
public string? ErrorMessage { get; } // Error details if Success is false
}
IMarkdownConverter
Contract for implementing custom converters.
public interface IMarkdownConverter
{
// Check if this converter handles the given file extension
bool CanHandle(string fileExtension);
// Perform the conversion (extension includes the leading dot)
Task<string> ConvertAsync(Stream fileStream, string fileExtension);
}
IStreamingMarkdownConverter
Extended contract for converters that support streaming (chunk-by-chunk processing).
public interface IStreamingMarkdownConverter : IMarkdownConverter
{
// Converts content to Markdown, yielding chunks asynchronously
IAsyncEnumerable<string> ConvertStreamingAsync(
Stream fileStream,
string fileExtension,
CancellationToken cancellationToken = default);
}
IConverterPlugin
Contract for plugin packages that bundle one or more converters.
public interface IConverterPlugin
{
// Human-readable name of the plugin (e.g., "Excel", "AI")
string Name { get; }
// Returns all converters provided by this plugin
IEnumerable<IMarkdownConverter> GetConverters();
}
ConverterRegistry
Manages and resolves converters by file extension.
public class ConverterRegistry
{
public void Register(IMarkdownConverter converter);
public void RegisterPlugin(IConverterPlugin plugin);
public IMarkdownConverter? Resolve(string extension);
public IReadOnlyList<IMarkdownConverter> GetAll();
}
Custom Converters
You can implement custom converters for unsupported file formats by implementing IConverterPlugin or IMarkdownConverter:
Quick Custom Converter
Implement IMarkdownConverter for a single format:
using ElBruno.MarkItDotNet;
using System.Text;
public class CsvConverter : IMarkdownConverter
{
public bool CanHandle(string fileExtension) =>
fileExtension.Equals(".csv", StringComparison.OrdinalIgnoreCase);
public async Task<string> ConvertAsync(Stream fileStream, string fileExtension)
{
using var reader = new StreamReader(fileStream, leaveOpen: true);
var csv = await reader.ReadToEndAsync();
var lines = csv.Split('\n');
if (lines.Length == 0) return string.Empty;
var sb = new StringBuilder();
// Header row
var headers = lines[0].Split(',');
sb.Append("| ");
sb.Append(string.Join(" | ", headers));
sb.AppendLine(" |");
sb.Append("|");
sb.Append(string.Concat(headers.Select(_ => " --- |")));
sb.AppendLine();
// Data rows
for (int i = 1; i < lines.Length; i++)
{
if (string.IsNullOrWhiteSpace(lines[i])) continue;
var cells = lines[i].Split(',');
sb.Append("| ");
sb.Append(string.Join(" | ", cells));
sb.AppendLine(" |");
}
return sb.ToString();
}
}
Register with DI:
services.AddMarkItDotNet();
var registry = provider.GetRequiredService<ConverterRegistry>();
registry.Register(new CsvConverter());
Satellite Plugin Package
For reusable plugins, implement IConverterPlugin:
using ElBruno.MarkItDotNet;
public class MyCustomPlugin : IConverterPlugin
{
public string Name => "MyCustom";
public IEnumerable<IMarkdownConverter> GetConverters() =>
[
new MyFormatConverter1(),
new MyFormatConverter2()
];
}
Register in DI:
services.AddSingleton<IConverterPlugin>(new MyCustomPlugin());
The registry automatically discovers and loads all registered plugins.
π¦ Samples
See Samples Guide for detailed walkthroughs.
Simple Samples
| Sample | Description | Run Command |
|---|---|---|
| BasicConversion | Text, JSON, and HTML conversion with DI | dotnet run --project src/samples/BasicConversion/BasicConversion.csproj |
| CsvConversion | CSV and TSV β Markdown tables | dotnet run --project src/samples/CsvConversion/CsvConversion.csproj |
| XmlYamlConversion | XML and YAML β fenced code blocks | dotnet run --project src/samples/XmlYamlConversion/XmlYamlConversion.csproj |
| PdfConversion | PDF β Markdown with page metadata + streaming | dotnet run --project src/samples/PdfConversion/PdfConversion.csproj |
| DocxConversion | DOCX β Markdown with headings, tables, links | dotnet run --project src/samples/DocxConversion/DocxConversion.csproj |
| RtfEpubConversion | RTF and EPUB β Markdown | dotnet run --project src/samples/RtfEpubConversion/RtfEpubConversion.csproj |
| ExcelConversion | Excel .xlsx β Markdown tables (Excel package) | dotnet run --project src/samples/ExcelConversion/ExcelConversion.csproj |
| PowerPointConversion | PPTX slides + notes β Markdown (PowerPoint package) | dotnet run --project src/samples/PowerPointConversion/PowerPointConversion.csproj |
| AiImageDescription | Image OCR/captioning via IChatClient (AI package) | dotnet run --project src/samples/AiImageDescription/AiImageDescription.csproj |
| StreamingConversion | IAsyncEnumerable streaming for large PDFs | dotnet run --project src/samples/StreamingConversion/StreamingConversion.csproj |
| CustomConverter | Build a custom IMarkdownConverter (.ini files) | dotnet run --project src/samples/CustomConverter/CustomConverter.csproj |
| PluginPackage | Build and register a custom IConverterPlugin | dotnet run --project src/samples/PluginPackage/PluginPackage.csproj |
| AllFormats | Converts all supported formats in one app | dotnet run --project src/samples/AllFormats/AllFormats.csproj |
| UrlConversion | Web page URL β Markdown | dotnet run --project src/samples/UrlConversion/UrlConversion.csproj |
| WhisperTranscription | Local audio transcription via Whisper ONNX | dotnet run --project src/samples/WhisperTranscription/WhisperTranscription.csproj |
End-to-End Samples
| Sample | Description | Run Command |
|---|---|---|
| MarkItDotNet.WebApi | ASP.NET Core Minimal API with file upload + streaming | dotnet run --project src/samples/MarkItDotNet.WebApi/MarkItDotNet.WebApi.csproj |
| BatchProcessor | Watches folder and batch-converts files to .md | dotnet run --project src/samples/BatchProcessor/BatchProcessor.csproj |
| RagPipeline | RAG ingestion: files β Markdown β chunked JSON | dotnet run --project src/samples/RagPipeline/RagPipeline.csproj |
Documentation
- Samples Guide β detailed walkthroughs for all sample projects
- Architecture β design decisions, plugin system, converter pipeline, and internal structure
- Plugins Guide β how to create custom plugin packages
- Building & Testing β how to build from source and run tests
- Image Generation Prompts β AI prompts for branding assets
- Acknowledgements β open-source libraries that power this project
π€ Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Security
ElBruno.MarkItDotNet processes untrusted file content and includes built-in security protections:
- SSRF Protection β URL converter blocks private/internal IP addresses
- File Size Limits β Configurable maximum file size (default 100MB)
- XXE Prevention β XML parser explicitly prohibits DTD processing
- Prompt Injection Mitigation β AI converters use system/user message separation
For detailed security guidance, see docs/security.md.
To report a security vulnerability, please use GitHub Security Advisories.
π License
This project is licensed under the MIT License β see the LICENSE file for details.
π About the Author
Made with β€οΈ by Bruno Capuano (ElBruno)
- π Blog: elbruno.com
- πΊ YouTube: youtube.com/elbruno
- π LinkedIn: linkedin.com/in/elbruno
- π Twitter: twitter.com/elbruno
- ποΈ Podcast: notienenombre.com
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- ElBruno.MarkItDotNet (>= 0.5.3)
- Microsoft.Extensions.AI.Abstractions (>= 9.5.0)
- PdfPig (>= 0.1.14)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.