TestBucket.AI.Xunit
0.0.5
dotnet add package TestBucket.AI.Xunit --version 0.0.5
NuGet\Install-Package TestBucket.AI.Xunit -Version 0.0.5
<PackageReference Include="TestBucket.AI.Xunit" Version="0.0.5" />
<PackageVersion Include="TestBucket.AI.Xunit" Version="0.0.5" />
<PackageReference Include="TestBucket.AI.Xunit" />
paket add TestBucket.AI.Xunit --version 0.0.5
#r "nuget: TestBucket.AI.Xunit, 0.0.5"
#:package TestBucket.AI.Xunit@0.0.5
#addin nuget:?package=TestBucket.AI.Xunit&version=0.0.5
#tool nuget:?package=TestBucket.AI.Xunit&version=0.0.5
MCP and IChatClient test extensions for xunit v3
The TestBucket.AI.Xunit
package provides helpers for writing integration tests related to IChatClient and Model Context Protocol (MCP) servers using xUnit v3.
Getting Started
Add Package Reference
- Ensure your test project references the following NuGet package:
TestBucket.AI.Xunit
Testing an MCP server
Use McpClientFactory
(from the official ModelContextProtocol
library) to create an IMcpClient
instance.
Typically, you will need to provide a transport (e.g., SseClientTransport
) and authentication headers (if needed).
Example
using ModelContextProtocol.Client;
using TestBucket.AI.Xunit;
[Fact]
public async Task ExpectedTextFoundInSuccessfulResponse_AfterInvokingNavigateTool_WithCorrectArguments()
{
// Arrange
// For a complete example refer to tests/TestBucket.AI.PlaywrightMcpIntegrationTests/PlaywrightIntegrationTests.cs
var client = await CreatePlaywrightMcpClientAsync();
string url = "https://github.com/microsoft/playwright-mcp";
var arguments = new ToolArgumentBuilder()
.WithArgument("url", url)
.Build();
// Act: Call the tool
var response = await client.TestCallToolAsync("browser_navigate", arguments);
// Assert
response.ShouldBeSuccess();
response.ShouldHaveContent(content =>
{
content.ShouldContain("### Ran Playwright code");
content.ShouldContain($"await page.goto('{url}');");
});
}
Benchmarking models
To benchmark models, you can use the TestBucket.AI.Xunit
package to create tests that measure the performance (accuracy) of different models.
The package provides a way to instrument calls to models and record metrics such as the accuracy invoking the correct tools with the correct arguments.
As tools are added to a product, the selection by the LLM may be impacted especially if the tools are similar. The benchmarking model can be used to validate that specific prompts result in accurate calls to the correct tool.
Benchmarking example with verification of tool invokation result
foreach (string model in new string[] { "llama3.1:8b", "mistral-nemo:12b" })
{
IChatClient client = InstrumentationChatClientFactory.Create(...);
var benchmarkResult = await client.BencharkAsync("Add 3 and 6", iterations:2, (iterationResult) =>
{
iterationResult.ShouldBeSuccess();
iterationResult.ContainsFunctionCall("Add");
});
// Write summary
TestContext.Current.TestOutputHelper?.WriteLine($"Model: {model}, Passrate={benchmarkResult.Passrate}");
// Write exceptions
foreach(var exception in benchmarkResult.Exceptions)
{
TestContext.Current.TestOutputHelper?.WriteLine(exception.ToString());
}
}
Benchmarking results in xunit results XML
Details are added to the xunit results XML file, which can be used for further analysis, debugging or monitoring trends.
Note that
IChatClient
should be created with both instrumentation and functional calling enabled in order to record metrics. If you are not using the built-inInstrumentationChatClientFactory
class to create aIChatClient
, refer to that implementation to setup theIChatClient
pipeline accordingly.
<attachments>
<attachment name="AIUserPrompt">
<![CDATA[ Add 3 and 6 ]]>
</attachment>
<attachment name="metric:llama3.1_8b:passrate">
<![CDATA[ 100%@1749793857501 ]]>
</attachment>
<attachment name="metric:mistral-nemo_12b:passrate">
<![CDATA[ 100%@1749793862549 ]]>
</attachment>
</attachments>
Verifying that the correct tool is called from a user-prompt
When adding new tools to your MCP server, it is possible that the tool selection breaks. The tool selection can be tested by calling TestGetResponseAsync and examining the result which will contain information about what tools were called as well as additional diagnostics data.
This example uses OllamaFixture to create an Ollama test container (using Testcontainers.Ollama)
[EnrichedTest]
[IntegrationTest]
public class CalculatorToolTests(OllamaFixture Ollama) : IClassFixture<OllamaFixture>
{
[Theory]
[InlineData("llama3.1:8b")]
public async Task CallSubtractTool_WithSimplePrompt_CorrectToolIsInvoked(string model)
{
// Arrange
IChatClient chatClient = await CreateInstrumentedChatClientAsync(model);
// Act
InstrumentationTestResult result = await chatClient.TestGetResponseAsync("Subtract 5 from 19");
// Assert
result.ShouldBeSuccess();
result.ContainsFunctionCall("Subtract", 1)
.WithArgument("a", 19)
.WithArgument("b", 5);
}
private async Task<IChatClient> CreateInstrumentedChatClientAsync(string model)
{
var toolAssembly = typeof(CalculatorMcp).Assembly;
var chatClient = await Ollama.CreateChatClientAsync(model,
configureServices: (services) =>
{
// Add any services required by the tools
services.AddSingleton<ICalculator, Calculator>();
},
configureTools: (tools) =>
{
// Add McpServerTools from the assembly
// Note: This scans the assembly for classes defining tools using the [McpServerToolType] attribute
tools.AddMcpServerToolsFromAssembly(toolAssembly);
});
return chatClient;
}
}
Rich reports
When generating unit test reports, the TestBucket.AI.Xunit
package provides additional details and metrics.
Example of xunit xml report
<attachments>
<attachment name="AIUserPrompt">
<![CDATA[ Add 3 and 6 ]]>
</attachment>
<attachment name="AIModelName">
<![CDATA[ llama3.1:8b ]]>
</attachment>
<attachment name="AIProviderName">
<![CDATA[ ollama ]]>
</attachment>
<attachment name="AIProviderVersion">
<![CDATA[ 0.6.6 ]]>
</attachment>
<attachment name="metric:testbucket.ai:input_token_count">
<![CDATA[ 325tokens@1749786387459 ]]>
</attachment>
<attachment name="metric:testbucket.ai:output_token_count">
<![CDATA[ 36tokens@1749786387464 ]]>
</attachment>
<attachment name="metric:testbucket.ai:total_token_count">
<![CDATA[ 361tokens@1749786387464 ]]>
</attachment>
<attachment name="metric:xunit:test-duration">
<![CDATA[ 35715.1986ms@1749786387472 ]]>
</attachment>
<attachment name="TestDescription">
<![CDATA[ # TestBucket.McpTests.OllamaIntegrationTests.Llama3ToolInstrumentationTests.CallAddTool_WithTwoTools_CorrectToolIsInvoked(System.String) ## Summary Verifies that the correct tool is invoked when multiple tools are available ## Source | Assembly | Class | Method | | -------- | ----- | ------ | | TestBucket.AI.OllamaIntegrationTests | TestBucket.McpTests.OllamaIntegrationTests.Llama3ToolInstrumentationTests | CallAddTool_WithTwoTools_CorrectToolIsInvoked | ### Parameters | Name | Summary | | -------- | ------------------- | | model | | ]]>
</attachment>
</attachments>
Note: Test description is extracted from the xmldoc, and requires setting GenerateDocumentationFile to true in the .csproj file.
<PropertyGroup>
<GenerateDocumentationFile>true</GenerateDocumentationFile>
</PropertyGroup>
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net9.0
- Microsoft.Extensions.AI (>= 9.6.0)
- Microsoft.Extensions.DependencyInjection (>= 9.0.6)
- ModelContextProtocol.Core (>= 0.2.0-preview.3)
- OllamaSharp (>= 5.2.2)
- TestBucket.Traits.Xunit (>= 1.0.4)
- Testcontainers.Ollama (>= 4.5.0)
- xunit.v3.assert (>= 2.0.3)
- xunit.v3.extensibility.core (>= 2.0.3)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.