WebLookup 0.2.1

dotnet add package WebLookup --version 0.2.1
                    
NuGet\Install-Package WebLookup -Version 0.2.1
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="WebLookup" Version="0.2.1" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="WebLookup" Version="0.2.1" />
                    
Directory.Packages.props
<PackageReference Include="WebLookup" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add WebLookup --version 0.2.1
                    
#r "nuget: WebLookup, 0.2.1"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package WebLookup@0.2.1
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=WebLookup&version=0.2.1
                    
Install as a Cake Addin
#tool nuget:?package=WebLookup&version=0.2.1
                    
Install as a Cake Tool

WebLookup

NuGet NuGet Downloads Build License: MIT

A lightweight .NET library for fast URL discovery across multiple search providers with built-in rate limiting, automatic fallback, and site exploration.

WebLookup is a URL search engine, not a content parser. It collects URLs and metadata (title, description) from search APIs and sitemaps, then hands them off to your crawler or parser of choice.

Features

  • Multi-provider search — DuckDuckGo (no API key), Google, Mojeek, SearchApi, Tavily
  • Parallel execution — All configured providers queried simultaneously
  • Smart fallback — Continues serving results from healthy providers when one fails or hits rate limits
  • URL deduplication — Merges results across providers, removes duplicates
  • Site exploration — Parse robots.txt rules and sitemap.xml hierarchies
  • Rate limit handling — Auto-detects 429/Retry-After, exponential backoff per provider
  • Minimal dependencies — Built on System.Net.Http and System.Text.Json; optional DI integration
  • DI-friendly — First-class Microsoft.Extensions.DependencyInjection support

Installation

dotnet add package WebLookup

Quick Start

Zero-config search (no API key needed)

using WebLookup;

// DuckDuckGo requires no API key
var provider = new DuckDuckGoSearchProvider();
var results = await provider.SearchAsync("dotnet web search", count: 5);

Search across multiple providers

using WebLookup;

var client = new WebSearchClient(
    new DuckDuckGoSearchProvider(),  // No API key needed
    new GoogleSearchProvider(new() { Engines = [new() { ApiKey = "...", Cx = "..." }] }),
    new MojeekSearchProvider(new() { ApiKey = "..." }),
    new SearchApiProvider(new() { ApiKey = "..." }),
    new TavilySearchProvider(new() { ApiKey = "..." })
);

var results = await client.SearchAsync("dotnet web search library");

foreach (var item in results)
{
    Console.WriteLine($"[{item.Provider}] {item.Title}");
    Console.WriteLine($"  {item.Url}");
    Console.WriteLine($"  {item.Description}");
}

Results are deduplicated by URL. Providers run in parallel. If one provider hits a rate limit, results from others are still returned.

Response format

SearchAsync returns IReadOnlyList<SearchResult>:

[DuckDuckGo] Apache Lucene.NET is a powerful open source .NET search library
  https://lucenenet.apache.org/
  Apache Lucene.Net is a .NET full-text search engine framework...

[DuckDuckGo] GitHub - apache/lucenenet: Apache Lucene.NET
  https://github.com/apache/lucenenet
  Apache Lucene.Net is a high performance search library for .NET...

[Google] WebLookup - NuGet Gallery
  https://www.nuget.org/packages/WebLookup
  A lightweight .NET library for fast URL discovery...

[Tavily] Azure Cognitive Search Documentation
  https://learn.microsoft.com/en-us/azure/search/
  Cloud search service with built-in AI capabilities...

Each SearchResult contains:

Field Type Description
Url string Deduplicated absolute URL
Title string Page title
Description string? Snippet or summary (may be null)
Provider string? Source provider name ("DuckDuckGo", "Google", "Tavily", etc.)

When using WebSearchClient with multiple providers, results are deduplicated by URL (case-insensitive, fragments and trailing slashes removed). The first provider to return a URL wins.

Use a single provider

// Google Custom Search
var google = new GoogleSearchProvider(new()
{
    Engines = [new() { ApiKey = "YOUR_API_KEY", Cx = "YOUR_CX" }]
});

var results = await google.SearchAsync("query", count: 5);
// Tavily
var tavily = new TavilySearchProvider(new() { ApiKey = "YOUR_API_KEY" });

var results = await tavily.SearchAsync("query", count: 5);

Explore a site

var explorer = new SiteExplorer();

// Read robots.txt
var robots = await explorer.GetRobotsAsync(new Uri("https://example.com"));
Console.WriteLine($"Crawl-Delay: {robots.CrawlDelay}");
Console.WriteLine($"Sitemaps: {string.Join(", ", robots.Sitemaps)}");

foreach (var rule in robots.Rules)
{
    Console.WriteLine($"[{rule.UserAgent}] {rule.Type}: {rule.Path}");
}

// Read sitemap
var entries = await explorer.GetSitemapAsync(new Uri("https://example.com/sitemap.xml"));

foreach (var entry in entries)
{
    Console.WriteLine($"{entry.Url} (modified: {entry.LastModified}, priority: {entry.Priority})");
}

// Stream large sitemaps
await foreach (var entry in explorer.StreamSitemapAsync(new Uri("https://example.com/sitemap.xml")))
{
    Console.WriteLine(entry.Url);
}

Filter URLs with robots.txt rules

var robots = await explorer.GetRobotsAsync(new Uri("https://example.com"));

// Check if a path is allowed for your bot
bool allowed = robots.IsAllowed("/admin/page", userAgent: "MyBot");

Providers

Provider Class Auth API Docs
DuckDuckGo DuckDuckGoSearchProvider None HTML Lite
Google GoogleSearchProvider API Key + CX Custom Search JSON API
Mojeek MojeekSearchProvider API Key Mojeek Search API
SearchApi SearchApiProvider API Key (Bearer) SearchApi
Tavily TavilySearchProvider API Key Tavily

Rate Limiting

Each provider handles rate limits automatically via a built-in RateLimitHandler:

  1. Detection — Monitors HTTP 429 status and Retry-After headers
  2. Backoff — Exponential backoff per provider (1s → 2s → 4s → max 30s)
  3. Fallback — When a provider is throttled, other providers continue serving results
  4. Retry — Up to 3 retries per request (default)

Dependency Injection

services.AddWebLookup(options =>
{
    options.AddDuckDuckGo();  // No API key needed
    options.AddGoogle(g =>
    {
        g.AddEngine(config["Google:ApiKey"], config["Google:Cx"]);
    });
    options.AddMojeek(config["Mojeek:ApiKey"]);
    options.AddSearchApi(config["SearchApi:ApiKey"]);
    options.AddTavily(config["Tavily:ApiKey"]);
});

// Inject wherever needed
public class MyService(WebSearchClient search, SiteExplorer explorer) { }

API Reference

SearchResult

public record SearchResult
{
    public required string Url { get; init; }
    public required string Title { get; init; }
    public string? Description { get; init; }
    public string? Provider { get; init; }
}

ISearchProvider

public interface ISearchProvider
{
    string Name { get; }
    Task<IReadOnlyList<SearchResult>> SearchAsync(
        string query, int count = 10,
        CancellationToken cancellationToken = default);
}

RobotsInfo

public record RobotsInfo
{
    public IReadOnlyList<RobotsRule> Rules { get; init; }
    public IReadOnlyList<string> Sitemaps { get; init; }
    public TimeSpan? CrawlDelay { get; init; }
    public bool IsAllowed(string path, string userAgent = "*");
}

SitemapEntry

public record SitemapEntry
{
    public required string Url { get; init; }
    public DateTimeOffset? LastModified { get; init; }
    public string? ChangeFrequency { get; init; }
    public double? Priority { get; init; }
}

Requirements

  • .NET 10.0+
  • Microsoft.Extensions.DependencyInjection.Abstractions (for DI integration only)

License

MIT

Product Compatible and additional computed target framework versions.
.NET net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on WebLookup:

Package Downloads
IronHive.Flux.WebLookup

WebLookup → WebFlux → FluxIndex RAG pipeline for discovering, processing, and indexing web content

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.2.1 210 2/19/2026
0.2.0 106 2/13/2026
0.1.0 98 2/8/2026