WebLookup 0.2.1
dotnet add package WebLookup --version 0.2.1
NuGet\Install-Package WebLookup -Version 0.2.1
<PackageReference Include="WebLookup" Version="0.2.1" />
<PackageVersion Include="WebLookup" Version="0.2.1" />
<PackageReference Include="WebLookup" />
paket add WebLookup --version 0.2.1
#r "nuget: WebLookup, 0.2.1"
#:package WebLookup@0.2.1
#addin nuget:?package=WebLookup&version=0.2.1
#tool nuget:?package=WebLookup&version=0.2.1
WebLookup
A lightweight .NET library for fast URL discovery across multiple search providers with built-in rate limiting, automatic fallback, and site exploration.
WebLookup is a URL search engine, not a content parser. It collects URLs and metadata (title, description) from search APIs and sitemaps, then hands them off to your crawler or parser of choice.
Features
- Multi-provider search — DuckDuckGo (no API key), Google, Mojeek, SearchApi, Tavily
- Parallel execution — All configured providers queried simultaneously
- Smart fallback — Continues serving results from healthy providers when one fails or hits rate limits
- URL deduplication — Merges results across providers, removes duplicates
- Site exploration — Parse
robots.txtrules andsitemap.xmlhierarchies - Rate limit handling — Auto-detects 429/Retry-After, exponential backoff per provider
- Minimal dependencies — Built on
System.Net.HttpandSystem.Text.Json; optional DI integration - DI-friendly — First-class
Microsoft.Extensions.DependencyInjectionsupport
Installation
dotnet add package WebLookup
Quick Start
Zero-config search (no API key needed)
using WebLookup;
// DuckDuckGo requires no API key
var provider = new DuckDuckGoSearchProvider();
var results = await provider.SearchAsync("dotnet web search", count: 5);
Search across multiple providers
using WebLookup;
var client = new WebSearchClient(
new DuckDuckGoSearchProvider(), // No API key needed
new GoogleSearchProvider(new() { Engines = [new() { ApiKey = "...", Cx = "..." }] }),
new MojeekSearchProvider(new() { ApiKey = "..." }),
new SearchApiProvider(new() { ApiKey = "..." }),
new TavilySearchProvider(new() { ApiKey = "..." })
);
var results = await client.SearchAsync("dotnet web search library");
foreach (var item in results)
{
Console.WriteLine($"[{item.Provider}] {item.Title}");
Console.WriteLine($" {item.Url}");
Console.WriteLine($" {item.Description}");
}
Results are deduplicated by URL. Providers run in parallel. If one provider hits a rate limit, results from others are still returned.
Response format
SearchAsync returns IReadOnlyList<SearchResult>:
[DuckDuckGo] Apache Lucene.NET is a powerful open source .NET search library
https://lucenenet.apache.org/
Apache Lucene.Net is a .NET full-text search engine framework...
[DuckDuckGo] GitHub - apache/lucenenet: Apache Lucene.NET
https://github.com/apache/lucenenet
Apache Lucene.Net is a high performance search library for .NET...
[Google] WebLookup - NuGet Gallery
https://www.nuget.org/packages/WebLookup
A lightweight .NET library for fast URL discovery...
[Tavily] Azure Cognitive Search Documentation
https://learn.microsoft.com/en-us/azure/search/
Cloud search service with built-in AI capabilities...
Each SearchResult contains:
| Field | Type | Description |
|---|---|---|
Url |
string |
Deduplicated absolute URL |
Title |
string |
Page title |
Description |
string? |
Snippet or summary (may be null) |
Provider |
string? |
Source provider name ("DuckDuckGo", "Google", "Tavily", etc.) |
When using WebSearchClient with multiple providers, results are deduplicated by URL (case-insensitive, fragments and trailing slashes removed). The first provider to return a URL wins.
Use a single provider
// Google Custom Search
var google = new GoogleSearchProvider(new()
{
Engines = [new() { ApiKey = "YOUR_API_KEY", Cx = "YOUR_CX" }]
});
var results = await google.SearchAsync("query", count: 5);
// Tavily
var tavily = new TavilySearchProvider(new() { ApiKey = "YOUR_API_KEY" });
var results = await tavily.SearchAsync("query", count: 5);
Explore a site
var explorer = new SiteExplorer();
// Read robots.txt
var robots = await explorer.GetRobotsAsync(new Uri("https://example.com"));
Console.WriteLine($"Crawl-Delay: {robots.CrawlDelay}");
Console.WriteLine($"Sitemaps: {string.Join(", ", robots.Sitemaps)}");
foreach (var rule in robots.Rules)
{
Console.WriteLine($"[{rule.UserAgent}] {rule.Type}: {rule.Path}");
}
// Read sitemap
var entries = await explorer.GetSitemapAsync(new Uri("https://example.com/sitemap.xml"));
foreach (var entry in entries)
{
Console.WriteLine($"{entry.Url} (modified: {entry.LastModified}, priority: {entry.Priority})");
}
// Stream large sitemaps
await foreach (var entry in explorer.StreamSitemapAsync(new Uri("https://example.com/sitemap.xml")))
{
Console.WriteLine(entry.Url);
}
Filter URLs with robots.txt rules
var robots = await explorer.GetRobotsAsync(new Uri("https://example.com"));
// Check if a path is allowed for your bot
bool allowed = robots.IsAllowed("/admin/page", userAgent: "MyBot");
Providers
| Provider | Class | Auth | API Docs |
|---|---|---|---|
| DuckDuckGo | DuckDuckGoSearchProvider |
None | HTML Lite |
GoogleSearchProvider |
API Key + CX | Custom Search JSON API | |
| Mojeek | MojeekSearchProvider |
API Key | Mojeek Search API |
| SearchApi | SearchApiProvider |
API Key (Bearer) | SearchApi |
| Tavily | TavilySearchProvider |
API Key | Tavily |
Rate Limiting
Each provider handles rate limits automatically via a built-in RateLimitHandler:
- Detection — Monitors HTTP 429 status and
Retry-Afterheaders - Backoff — Exponential backoff per provider (1s → 2s → 4s → max 30s)
- Fallback — When a provider is throttled, other providers continue serving results
- Retry — Up to 3 retries per request (default)
Dependency Injection
services.AddWebLookup(options =>
{
options.AddDuckDuckGo(); // No API key needed
options.AddGoogle(g =>
{
g.AddEngine(config["Google:ApiKey"], config["Google:Cx"]);
});
options.AddMojeek(config["Mojeek:ApiKey"]);
options.AddSearchApi(config["SearchApi:ApiKey"]);
options.AddTavily(config["Tavily:ApiKey"]);
});
// Inject wherever needed
public class MyService(WebSearchClient search, SiteExplorer explorer) { }
API Reference
SearchResult
public record SearchResult
{
public required string Url { get; init; }
public required string Title { get; init; }
public string? Description { get; init; }
public string? Provider { get; init; }
}
ISearchProvider
public interface ISearchProvider
{
string Name { get; }
Task<IReadOnlyList<SearchResult>> SearchAsync(
string query, int count = 10,
CancellationToken cancellationToken = default);
}
RobotsInfo
public record RobotsInfo
{
public IReadOnlyList<RobotsRule> Rules { get; init; }
public IReadOnlyList<string> Sitemaps { get; init; }
public TimeSpan? CrawlDelay { get; init; }
public bool IsAllowed(string path, string userAgent = "*");
}
SitemapEntry
public record SitemapEntry
{
public required string Url { get; init; }
public DateTimeOffset? LastModified { get; init; }
public string? ChangeFrequency { get; init; }
public double? Priority { get; init; }
}
Requirements
- .NET 10.0+
Microsoft.Extensions.DependencyInjection.Abstractions(for DI integration only)
License
MIT
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
NuGet packages (1)
Showing the top 1 NuGet packages that depend on WebLookup:
| Package | Downloads |
|---|---|
|
IronHive.Flux.WebLookup
WebLookup → WebFlux → FluxIndex RAG pipeline for discovering, processing, and indexing web content |
GitHub repositories
This package is not used by any popular GitHub repositories.