Spidey 6.0.8

There is a newer version of this package available.
See the version list below for details.
dotnet add package Spidey --version 6.0.8
                    
NuGet\Install-Package Spidey -Version 6.0.8
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Spidey" Version="6.0.8" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Spidey" Version="6.0.8" />
                    
Directory.Packages.props
<PackageReference Include="Spidey" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Spidey --version 6.0.8
                    
#r "nuget: Spidey, 6.0.8"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Spidey@6.0.8
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Spidey&version=6.0.8
                    
Install as a Cake Addin
#tool nuget:?package=Spidey&version=6.0.8
                    
Install as a Cake Tool

<img src="https://jacraig.github.io/Spidey/images/icon.png" style="height:25px" alt="Spidey Icon" /> Spidey

.NET Publish NuGet

Spidey is a flexible and extensible .NET library for crawling web content. It is designed for .NET Core applications and provides a modular architecture, allowing you to customize or extend any part of the crawling pipeline.

Features

  • Simple API for crawling websites
  • Highly configurable via the Options class
  • Dependency injection support (IoC/DI)
  • Easily replaceable subsystems (engine, parser, scheduler, etc.)
  • Callback-based result handling
  • NuGet package available

Quick Start

Install the NuGet package:

dotnet add package Spidey

Setting up the Library

Register Spidey in your app's service collection using the RegisterSpidey extension method:

using Microsoft.Extensions.DependencyInjection;
using Spidey;

var services = new ServiceCollection();
services.RegisterSpidey();

// Optionally, register your Options configuration
services.AddSingleton(new Options
{
    ItemFound = result => Console.WriteLine($"Found: {result.Url}"),
    Allow = new List<string> { "http://mywebsite", "http://mywebsite2" },
    FollowOnly = new List<string> { /* regex patterns */ },
    Ignore = new List<string> { /* regex patterns */ },
    StartLocations = new List<string> { "http://mywebsite", "http://mywebsite2" },
    UrlReplacements = new Dictionary<string, string> { /* { "old", "new" } */ },
    // Other options as needed
});

var provider = services.BuildServiceProvider();
var crawler = provider.GetRequiredService<Crawler>();

Alternatively, you can instantiate Crawler and Options directly without DI:

var options = new Options
{
    ItemFound = result => Console.WriteLine($"Found: {result.Url}"),
    // ...other options
};
var crawler = new Crawler(options);

Options Configuration

The Options class configures the crawler's behavior. Key properties include:

  • ItemFound (Action<ResultFile>): Callback invoked when a new page is discovered.
  • Allow (List<string>): Regex patterns for URLs allowed to be crawled.
  • FollowOnly (List<string>): Regex patterns for pages whose links should be followed.
  • Ignore (List<string>): Regex patterns for URLs to ignore.
  • StartLocations (List<string>): Initial URLs to start crawling from.
  • UrlReplacements (Dictionary<string, string>): URL replacements during crawling.
  • NetworkCredentials (NetworkCredential): Optional credentials for authentication.
  • UseDefaultCredentials (bool): Use default system credentials.
  • Proxy (IWebProxy): Optional proxy settings.

Example callback method:

void OnItemFound(ResultFile result)
{
    Console.WriteLine($"Discovered: {result.Url} (Status: {result.StatusCode})");
    // Additional processing...
}

Basic Usage

Once configured, start the crawl process:

crawler.StartCrawl();

The library will handle link discovery, content downloading, and result parsing. Your callback will be invoked for each discovered item.

Customization

Spidey is built with extensibility in mind. The system is divided into the following subsystems, each replaceable via DI:

  1. Content Parser (IContentParser) – Parses downloaded data into ResultFile objects.
  2. Engine (IEngine) – Handles HTTP requests and content downloading.
  3. Link Discoverer (ILinkDiscoverer) – Extracts links from content.
  4. Processor (IProcessor) – Processes parsed content (default: invokes your callback).
  5. Scheduler (IScheduler) – Manages work distribution.
  6. Pipeline (IPipeline) – Orchestrates the crawling process.

To customize, implement the relevant interface from Spidey.Engines.Interfaces and register your implementation in the service provider. Note that if you call RegisterSpidey(), the registration is handled for you automatically. If you instantiate Crawler directly, you must compose the pipeline manually.

FAQ

Q: Can I run the crawler on multiple nodes?

A: The default scheduler is single-node only. For distributed crawling, implement a custom scheduler (e.g., using a database or message queue) to coordinate work between instances.

Build Process

Requirements:

  • Visual Studio 2022

Clone the project and open the solution (Spidey.sln) in Visual Studio to build.

License

See LICENSE for details.

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 is compatible.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
6.0.10 82 1/21/2026
6.0.9 222 10/16/2025
6.0.8 512 10/7/2025
6.0.7 233 9/30/2025
6.0.6 281 8/28/2025
6.0.5 218 8/19/2025
6.0.4 214 7/16/2025
6.0.3 239 6/27/2025
6.0.2 254 6/25/2025
6.0.1 273 12/9/2024
6.0.0 228 11/25/2024
5.0.131 248 11/12/2024
5.0.130 220 11/11/2024
5.0.129 213 11/6/2024
5.0.128 204 11/5/2024
5.0.127 223 11/4/2024
5.0.126 225 10/31/2024
5.0.125 214 10/30/2024
5.0.124 233 10/29/2024
5.0.123 244 10/11/2024
5.0.122 224 10/10/2024
5.0.121 213 10/9/2024
5.0.120 227 10/2/2024
5.0.119 208 10/1/2024
5.0.118 233 9/24/2024
5.0.117 240 9/17/2024
5.0.116 263 9/10/2024
5.0.115 236 9/3/2024
5.0.114 225 8/30/2024
5.0.113 235 8/27/2024
5.0.112 251 8/26/2024
5.0.111 231 8/23/2024
5.0.110 256 8/21/2024
5.0.109 242 8/20/2024
5.0.108 257 8/16/2024
5.0.107 239 8/15/2024
5.0.106 218 8/5/2024
5.0.105 215 8/2/2024
5.0.104 202 8/1/2024
5.0.103 212 7/26/2024
5.0.102 242 7/11/2024
5.0.101 229 7/2/2024
5.0.100 235 6/27/2024
5.0.99 237 6/26/2024
5.0.98 235 6/19/2024
5.0.97 235 6/18/2024
5.0.96 260 6/17/2024
5.0.95 232 6/14/2024
5.0.94 216 6/13/2024
5.0.93 217 6/12/2024
5.0.92 230 5/31/2024
5.0.91 239 5/30/2024
5.0.90 239 5/17/2024
5.0.89 240 5/16/2024
5.0.88 250 5/8/2024
5.0.87 259 5/7/2024
5.0.86 270 5/6/2024
5.0.85 203 5/3/2024
5.0.84 222 5/2/2024
5.0.83 235 5/1/2024
5.0.82 236 4/30/2024
5.0.81 224 4/16/2024
5.0.80 247 4/12/2024
5.0.79 230 4/11/2024
5.0.78 264 4/1/2024
5.0.77 239 3/29/2024
5.0.76 244 3/18/2024
5.0.75 241 3/15/2024
5.0.74 251 3/14/2024
5.0.73 238 3/11/2024
5.0.72 235 3/8/2024
5.0.71 227 3/7/2024
5.0.70 261 3/6/2024
5.0.69 257 3/5/2024
5.0.68 226 3/4/2024
5.0.67 247 2/29/2024
5.0.66 245 2/28/2024
5.0.65 242 2/26/2024
5.0.64 242 2/23/2024
5.0.63 255 2/22/2024
5.0.62 255 2/21/2024
5.0.61 257 2/16/2024
5.0.60 256 2/15/2024
5.0.59 252 2/12/2024
5.0.58 250 2/8/2024
5.0.57 240 2/7/2024
5.0.56 234 2/6/2024
5.0.55 224 2/1/2024
5.0.54 242 1/31/2024
5.0.53 242 1/30/2024
5.0.52 223 1/24/2024
5.0.51 242 1/23/2024
5.0.50 252 1/12/2024
5.0.49 249 1/11/2024
5.0.48 259 12/26/2023
5.0.47 230 12/22/2023
5.0.46 236 12/18/2023
5.0.45 229 12/15/2023
5.0.44 223 12/14/2023
5.0.43 235 12/13/2023
5.0.42 221 12/12/2023
5.0.41 220 11/24/2023
5.0.40 235 11/21/2023
5.0.39 211 11/20/2023
5.0.38 196 11/17/2023
5.0.37 207 11/16/2023
5.0.36 207 11/14/2023
5.0.35 185 11/8/2023
5.0.34 194 11/7/2023
5.0.33 193 11/6/2023
5.0.32 190 11/1/2023
5.0.31 211 10/31/2023
5.0.30 226 10/30/2023
5.0.29 185 10/26/2023
5.0.28 242 10/12/2023
5.0.27 250 10/5/2023
5.0.26 233 9/26/2023
5.0.25 218 9/20/2023
5.0.24 198 9/19/2023
5.0.23 250 9/18/2023
5.0.22 225 9/14/2023
5.0.21 197 9/13/2023
5.0.20 232 9/11/2023
5.0.19 223 9/7/2023
5.0.18 237 9/6/2023
5.0.17 230 9/5/2023
5.0.16 235 9/4/2023
5.0.15 220 9/1/2023
5.0.14 229 8/31/2023
5.0.13 250 8/30/2023
5.0.12 259 8/29/2023
5.0.11 248 8/25/2023
5.0.10 259 8/23/2023
5.0.9 246 8/18/2023
5.0.8 268 8/10/2023
5.0.7 264 8/8/2023
5.0.6 258 8/8/2023
5.0.5 299 8/7/2023
5.0.4 276 8/3/2023
5.0.3 288 7/26/2023
5.0.2 254 7/20/2023
5.0.1 290 7/14/2023
5.0.0 457 12/12/2022
4.0.5 702 6/10/2022
4.0.2 659 1/11/2022
4.0.1 643 7/19/2021
3.0.9 719 1/7/2021
3.0.7 789 9/13/2020
3.0.6 767 6/26/2020
3.0.5 750 6/26/2020
3.0.3 754 3/25/2020
3.0.2 812 3/1/2020
3.0.1 854 1/1/2020
3.0.0 806 12/23/2019
2.0.15 821 11/22/2019
2.0.14 797 11/22/2019
2.0.13 772 11/22/2019
2.0.12 768 11/21/2019
2.0.11 770 11/21/2019
2.0.10 783 11/21/2019
2.0.9 790 11/21/2019
2.0.8 943 3/3/2019
2.0.7 876 3/3/2019
2.0.6 899 3/3/2019
2.0.5 882 3/3/2019
2.0.4 955 2/7/2019
2.0.3 1,511 6/1/2018
2.0.2 1,534 5/22/2018
2.0.1 1,665 1/2/2018
1.0.12 1,382 11/2/2017
1.0.11 1,388 10/30/2017
1.0.10 1,391 10/26/2017
1.0.9 1,383 10/26/2017
1.0.8 1,413 10/26/2017
1.0.7 1,371 10/25/2017
1.0.6 1,351 10/25/2017
1.0.5 1,389 10/24/2017
1.0.4 1,319 10/24/2017
1.0.3 1,309 10/19/2017
1.0.2 1,462 10/18/2017
1.0.1 1,375 9/29/2017