ElBruno.MarkItDotNet.AI
0.2.0
See the version list below for details.
dotnet add package ElBruno.MarkItDotNet.AI --version 0.2.0
NuGet\Install-Package ElBruno.MarkItDotNet.AI -Version 0.2.0
<PackageReference Include="ElBruno.MarkItDotNet.AI" Version="0.2.0" />
<PackageVersion Include="ElBruno.MarkItDotNet.AI" Version="0.2.0" />
<PackageReference Include="ElBruno.MarkItDotNet.AI" />
paket add ElBruno.MarkItDotNet.AI --version 0.2.0
#r "nuget: ElBruno.MarkItDotNet.AI, 0.2.0"
#:package ElBruno.MarkItDotNet.AI@0.2.0
#addin nuget:?package=ElBruno.MarkItDotNet.AI&version=0.2.0
#tool nuget:?package=ElBruno.MarkItDotNet.AI&version=0.2.0
ElBruno.MarkItDotNet
.NET library that converts 15+ file formats to Markdown for AI pipelines, documentation workflows, and developer tools. Inspired by Python markitdown.
Description
ElBruno.MarkItDotNet provides a unified interface to convert 15+ file formats into clean, structured Markdown. The core package handles text, JSON, HTML, Word, PDF, RTF, EPUB, images, CSV, XML, and YAML. Extend with satellite packages for Excel, PowerPoint, and AI-powered features (OCR, image captioning, audio transcription). Designed for AI content pipelines, documentation systems, and any scenario where you need consistent Markdown output from mixed file sources.
Supported Formats
| Format | Extensions | Converter | Package | Dependencies |
|---|---|---|---|---|
| Plain Text | .txt, .md, .log |
PlainTextConverter |
Core | None |
| JSON | .json |
JsonConverter |
Core | None |
| HTML | .html, .htm |
HtmlConverter |
Core | ReverseMarkdown |
| Word (DOCX) | .docx |
DocxConverter |
Core | DocumentFormat.OpenXml |
.pdf |
PdfConverter |
Core | PdfPig |
|
| CSV | .csv |
CsvConverter |
Core | None |
| XML | .xml |
XmlConverter |
Core | None |
| YAML | .yaml, .yml |
YamlConverter |
Core | None |
| RTF | .rtf |
RtfConverter |
Core | RtfPipe |
| EPUB | .epub |
EpubConverter |
Core | VersOne.Epub |
| Images | .jpg, .jpeg, .png, .gif, .bmp, .webp, .svg |
ImageConverter |
Core | None |
| Excel (XLSX) | .xlsx |
ExcelConverter |
Excel | ClosedXML |
| PowerPoint (PPTX) | .pptx |
PowerPointConverter |
PowerPoint | DocumentFormat.OpenXml |
| Images (AI-OCR) | All image formats | AiImageConverter |
AI | Microsoft.Extensions.AI |
| Audio (Transcription) | .mp3, .wav, .m4a, .ogg |
AiAudioConverter |
AI | Microsoft.Extensions.AI |
| PDF (AI-OCR) | .pdf |
AiPdfConverter |
AI | Microsoft.Extensions.AI |
Target Frameworks
- .NET 8.0 (LTS)
- .NET 10.0
Packages
ElBruno.MarkItDotNet is distributed across multiple NuGet packages for flexibility:
Core Package
ElBruno.MarkItDotNet — The main library with 11 built-in converters.
dotnet add package ElBruno.MarkItDotNet
Includes: Plain text, JSON, HTML, Word, PDF, RTF, EPUB, images, CSV, XML, YAML.
Satellite Packages
ElBruno.MarkItDotNet.Excel — Excel (XLSX) to Markdown converter (v0.2.0+)
dotnet add package ElBruno.MarkItDotNet.Excel
Converts spreadsheet sheets to Markdown tables.
ElBruno.MarkItDotNet.PowerPoint — PowerPoint (PPTX) to Markdown converter (v0.2.0+)
dotnet add package ElBruno.MarkItDotNet.PowerPoint
Converts slides and speaker notes to Markdown.
ElBruno.MarkItDotNet.AI — AI-powered converters (v0.2.0+)
dotnet add package ElBruno.MarkItDotNet.AI
Requires Microsoft.Extensions.AI (for IChatClient). Provides:
- AiImageConverter — OCR for images using LLM vision
- AiPdfConverter — OCR for PDFs using LLM vision
- AiAudioConverter — Transcription for audio files using LLM audio APIs
Installation
For the core library only:
dotnet add package ElBruno.MarkItDotNet
For Excel support:
dotnet add package ElBruno.MarkItDotNet.Excel
For PowerPoint support:
dotnet add package ElBruno.MarkItDotNet.PowerPoint
For AI-powered features (requires separate IChatClient registration):
dotnet add package ElBruno.MarkItDotNet.AI
Quick Start
The simplest way to get started is with the MarkdownConverter façade:
using ElBruno.MarkItDotNet;
// Convert a file to Markdown
var converter = new MarkdownConverter();
var markdown = converter.ConvertToMarkdown("document.txt");
Console.WriteLine(markdown);
// Or convert from a stream
using var stream = File.OpenRead("document.pdf");
var result = await converter.ConvertAsync(stream, ".pdf");
Console.WriteLine(result.Markdown);
The MarkdownConverter class pre-registers all built-in converters (from the core package) and provides synchronous and asynchronous conversion methods.
With Satellite Packages
When you install satellite packages (Excel, PowerPoint, AI), converters are automatically registered during dependency injection setup. The system discovers them via the plugin system.
Dependency Injection with Plugin System
For advanced scenarios (e.g., ASP.NET Core applications), use the DI extension methods to register MarkItDotNet services:
using Microsoft.Extensions.DependencyInjection;
using ElBruno.MarkItDotNet;
using ElBruno.MarkItDotNet.Excel;
using ElBruno.MarkItDotNet.PowerPoint;
var services = new ServiceCollection();
// Register core MarkItDotNet with built-in converters
services.AddMarkItDotNet();
// Register satellite package converters (plugins)
services.AddMarkItDotNetExcel();
services.AddMarkItDotNetPowerPoint();
// Register AI converters (requires IChatClient)
// services.AddMarkItDotNetAI();
var provider = services.BuildServiceProvider();
var markdownService = provider.GetRequiredService<MarkdownService>();
// Convert files through the service (converters auto-discovered)
var result = await markdownService.ConvertAsync("document.xlsx");
if (result.Success)
{
Console.WriteLine(result.Markdown);
}
else
{
Console.WriteLine($"Error: {result.ErrorMessage}");
}
All registered converters (core + plugins) are automatically available through the MarkdownService.
Streaming Conversion
For large files, use the streaming API to process content chunk-by-chunk:
var converter = new MarkdownConverter();
using var stream = File.OpenRead("large-document.pdf");
await foreach (var chunk in converter.ConvertStreamingAsync(stream, ".pdf"))
{
Console.Write(chunk);
}
The streaming API yields Markdown chunks asynchronously (e.g., page-by-page for PDFs), enabling memory-efficient processing of large files.
AI-Powered Conversion
The ElBruno.MarkItDotNet.AI package provides converters that use LLM vision and audio APIs for advanced capabilities:
Setup
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.AI;
using ElBruno.MarkItDotNet;
using ElBruno.MarkItDotNet.AI;
var services = new ServiceCollection();
// Register a chat client (e.g., OpenAI)
services.AddOpenAIChatClient("sk-...", "gpt-4-vision");
// Register core + AI converters
services.AddMarkItDotNet();
services.AddMarkItDotNetAI();
var provider = services.BuildServiceProvider();
var markdownService = provider.GetRequiredService<MarkdownService>();
// Use AI converters transparently
var result = await markdownService.ConvertAsync("screenshot.png");
Console.WriteLine(result.Markdown);
AI Converters
- AiImageConverter — Uses LLM vision to describe images and extract text
- AiPdfConverter — Uses LLM vision to OCR PDFs (complements plain text extraction)
- AiAudioConverter — Uses LLM audio APIs to transcribe audio files (MP3, WAV, M4A, OGG)
Configure behavior via AiOptions:
services.AddMarkItDotNetAI(options =>
{
options.ImageDescriptionPrompt = "Describe this image in detail...";
options.MaxRetries = 3;
});
API Reference
MarkdownService
The main service for converting files to Markdown. Use this in DI scenarios or when you need advanced control over converters.
public class MarkdownService
{
public MarkdownService(ConverterRegistry registry);
// Convert a file at the given path
public Task<ConversionResult> ConvertAsync(string filePath);
// Convert from a stream with explicit file extension
public Task<ConversionResult> ConvertAsync(Stream stream, string fileExtension);
// Stream conversion for large files
public IAsyncEnumerable<string> ConvertStreamingAsync(Stream stream, string fileExtension);
}
ConversionResult
Represents the outcome of a file conversion. Always check Success before accessing Markdown.
public class ConversionResult
{
public string Markdown { get; } // Converted content (empty if failed)
public string SourceFormat { get; } // Source format (e.g., ".pdf")
public bool Success { get; } // Whether conversion succeeded
public string? ErrorMessage { get; } // Error details if Success is false
}
IMarkdownConverter
Contract for implementing custom converters.
public interface IMarkdownConverter
{
// Check if this converter handles the given file extension
bool CanHandle(string fileExtension);
// Perform the conversion (extension includes the leading dot)
Task<string> ConvertAsync(Stream fileStream, string fileExtension);
}
IStreamingMarkdownConverter
Extended contract for converters that support streaming (chunk-by-chunk processing).
public interface IStreamingMarkdownConverter : IMarkdownConverter
{
// Converts content to Markdown, yielding chunks asynchronously
IAsyncEnumerable<string> ConvertStreamingAsync(
Stream fileStream,
string fileExtension,
CancellationToken cancellationToken = default);
}
IConverterPlugin
Contract for plugin packages that bundle one or more converters.
public interface IConverterPlugin
{
// Human-readable name of the plugin (e.g., "Excel", "AI")
string Name { get; }
// Returns all converters provided by this plugin
IEnumerable<IMarkdownConverter> GetConverters();
}
ConverterRegistry
Manages and resolves converters by file extension.
public class ConverterRegistry
{
public void Register(IMarkdownConverter converter);
public void RegisterPlugin(IConverterPlugin plugin);
public IMarkdownConverter? Resolve(string extension);
public IReadOnlyList<IMarkdownConverter> GetAll();
}
Custom Converters
You can implement custom converters for unsupported file formats by implementing IConverterPlugin or IMarkdownConverter:
Quick Custom Converter
Implement IMarkdownConverter for a single format:
using ElBruno.MarkItDotNet;
using System.Text;
public class CsvConverter : IMarkdownConverter
{
public bool CanHandle(string fileExtension) =>
fileExtension.Equals(".csv", StringComparison.OrdinalIgnoreCase);
public async Task<string> ConvertAsync(Stream fileStream, string fileExtension)
{
using var reader = new StreamReader(fileStream, leaveOpen: true);
var csv = await reader.ReadToEndAsync();
var lines = csv.Split('\n');
if (lines.Length == 0) return string.Empty;
var sb = new StringBuilder();
// Header row
var headers = lines[0].Split(',');
sb.Append("| ");
sb.Append(string.Join(" | ", headers));
sb.AppendLine(" |");
sb.Append("|");
sb.Append(string.Concat(headers.Select(_ => " --- |")));
sb.AppendLine();
// Data rows
for (int i = 1; i < lines.Length; i++)
{
if (string.IsNullOrWhiteSpace(lines[i])) continue;
var cells = lines[i].Split(',');
sb.Append("| ");
sb.Append(string.Join(" | ", cells));
sb.AppendLine(" |");
}
return sb.ToString();
}
}
Register with DI:
services.AddMarkItDotNet();
var registry = provider.GetRequiredService<ConverterRegistry>();
registry.Register(new CsvConverter());
Satellite Plugin Package
For reusable plugins, implement IConverterPlugin:
using ElBruno.MarkItDotNet;
public class MyCustomPlugin : IConverterPlugin
{
public string Name => "MyCustom";
public IEnumerable<IMarkdownConverter> GetConverters() =>
[
new MyFormatConverter1(),
new MyFormatConverter2()
];
}
Register in DI:
services.AddSingleton<IConverterPlugin>(new MyCustomPlugin());
The registry automatically discovers and loads all registered plugins.
Samples
| Sample | Description |
|---|---|
| BasicConversion | Console app demonstrating text, JSON, and HTML conversion with DI |
Documentation
- Architecture — design decisions, plugin system, converter pipeline, and internal structure
- Plugins Guide — how to create custom plugin packages
- Building & Testing — how to build from source and run tests
- Image Generation Prompts — AI prompts for branding assets
🤝 Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
📄 License
This project is licensed under the MIT License — see the LICENSE file for details.
👋 About the Author
Made with ❤️ by Bruno Capuano (ElBruno)
- 📝 Blog: elbruno.com
- 📺 YouTube: youtube.com/elbruno
- 🔗 LinkedIn: linkedin.com/in/elbruno
- 𝕏 Twitter: twitter.com/elbruno
- 🎙️ Podcast: notienenombre.com
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- ElBruno.MarkItDotNet (>= 0.2.0)
- Microsoft.Extensions.AI.Abstractions (>= 9.5.0)
- PdfPig (>= 0.1.14)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.