HillPhelmuth.SemanticKernel.LlmAsJudgeEvals 1.1.0

.NET 8.0

dotnet add package HillPhelmuth.SemanticKernel.LlmAsJudgeEvals --version 1.1.0

NuGet\Install-Package HillPhelmuth.SemanticKernel.LlmAsJudgeEvals -Version 1.1.0

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="HillPhelmuth.SemanticKernel.LlmAsJudgeEvals" Version="1.1.0" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="HillPhelmuth.SemanticKernel.LlmAsJudgeEvals" Version="1.1.0" />
                    

                            Directory.Packages.props

<PackageReference Include="HillPhelmuth.SemanticKernel.LlmAsJudgeEvals" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add HillPhelmuth.SemanticKernel.LlmAsJudgeEvals --version 1.1.0

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: HillPhelmuth.SemanticKernel.LlmAsJudgeEvals, 1.1.0"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package HillPhelmuth.SemanticKernel.LlmAsJudgeEvals@1.1.0

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=HillPhelmuth.SemanticKernel.LlmAsJudgeEvals&version=1.1.0
                    

                            Install as a Cake Addin

#tool nuget:?package=HillPhelmuth.SemanticKernel.LlmAsJudgeEvals&version=1.1.0
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

LlmAsJudgeEvals

This library provides a service for evaluating responses from Large Language Models (LLMs) using the LLM itself as a judge. It leverages Semantic Kernel to define and execute evaluation functions based on prompt templates.

For a more precise evaluation score, the library utilizes logprobs and calculates a weighted total of probabilities for each evaluation criterion.

Installation

Install the package via NuGet:

powershell:

Install-Package HillPhelmuth.SemanticKernel.LlmAsJudgeEvals

dotnet cli:

dotnet add package HillPhelmuth.SemanticKernel.LlmAsJudgeEvals

Usage

Built-in Evaluation Functions

The package includes a comprehensive set of built-in evaluation functions, each with an accompanying "Explain" version that provides detailed reasoning:

Groundedness: Evaluates factual accuracy and support in context
Groundedness2: Alternative groundedness evaluation with for whether answer logically follows from the context
Similarity: Measures response similarity to reference text
Relevance: Assesses response relevance to prompt/question
Coherence: Evaluates logical flow and consistency
Perceived Intelligence: Rates apparent knowledge and reasoning (with/without RAG)
Fluency: Measures natural language quality
Empathy: Assesses emotional understanding
Helpfulness: Evaluates practical value of response
Retrieval: Evaluates the retrieved content based on the query
Role Adherence: Measures how well the response maintains the persona, style, and constraints specified in the instructions or assigned role
Excessive Agency: Evaluates whether the response exhibits behaviors that go beyond the intended scope, permissions, or safeguards of the LLM (e.g., excessive autonomy, permissions, or functionality)

Each function has an "Explain" variant (e.g., GroundednessExplain, CoherenceExplain) that provides:

Numerical score
Detailed reasoning
Chain-of-thought analysis
Probability-weighted score

// Initialize the Semantic Kernel
var kernel = Kernel.CreateBuilder().AddOpenAIChatCompletion("openai-model-name", "openai-apiKey").Build();

// Create an instance of the EvalService
var evalService = new EvalService(kernel);

// Create an input model for the built-in evaluation function
var coherenceInput = InputModel.CoherenceModel("This is the answer to evaluate.", "This is the question or prompt that generated the answer");

// Execute the evaluation
var result = await evalService.ExecuteEval(coherenceInput);

Console.WriteLine($"Evaluation score: {result.Score}");

// Execute evaluation with detailed explanation
var resultWithExplanation = await evalService.ExecuteScorePlusEval(inputModel);

Console.WriteLine($"Score: {resultWithExplanation.Score}");
Console.WriteLine($"Reasoning: {resultWithExplanation.Reasoning}");
Console.WriteLine($"Chain of Thought: {resultWithExplanation.ChainOfThought}");

Factory Methods for Easy Access

var coherenceInput = InputModel.CoherenceModel(answer, question);
var groundednessInput = InputModel.GroundednessModel(answer, question, context);
var coherenceWithExplanationInput = InputModel.CoherenceExplainModel(answer, question);

Example Output (Score Plus Explanation)

{
    "EvalName": "CoherenceExplain",
    "Score": 4,
    "Reasoning": "The answer is mostly coherent with good flow and clear organization. It addresses the question directly and maintains logical connections between ideas.",
    "ChainOfThought": "1. First, I examined how the sentences connect\n2. Checked if ideas flow naturally\n3. Verified if the response stays focused on the question\n4. Assessed overall clarity and organization\n5. Considered natural language use",
    "ProbScore": 3.92
}

Custom Evaluation Functions

// Initialize the Semantic Kernel
var kernel = Kernel.CreateBuilder().AddOpenAIChatCompletion("openai-model-name", "openai-apiKey").Build();

// Create an instance of the EvalService
var evalService = new EvalService(kernel);

// Add an evaluation function (optional)
evalService.AddEvalFunction("MyEvalFunction", "This is the prompt for my evaluation function.", new PromptExecutionSettings());

// Create an input model for the evaluation function
var inputModel = new InputModel
{
    FunctionName = "MyEvalFunction", // Replace with the name of your evaluation function
    RequiredInputs = new Dictionary<string, string>
    {
        { "input", "This is the text to evaluate." }
    }
};

// Execute the evaluation
var result = await evalService.ExecuteEval(inputModel);

Console.WriteLine($"Evaluation score: {result.Score}");

Using KernelPlugin Directly (Alternative to EvalService for Built-in Evals)

You can use the evaluation functions directly by importing the plugin with ImportEvalPlugin and invoking functions via the kernel. This is an alternative to using EvalService.

// Initialize the Semantic Kernel
var kernel = Kernel.CreateBuilder().AddOpenAIChatCompletion("openai-model-name", "openai-apiKey").Build();

// Import the evaluation plugin (loads all built-in eval functions)
var evalPlugin = kernel.ImportEvalPlugin();

// Prepare input arguments for the function
var arguments = new KernelArguments
{
    ["input"] = "This is the answer to evaluate.",
    ["question"] = "This is the question or prompt that generated the answer."
};

// Get the 'Coherence' evaluation function from the plugin
var coherenceFunction = evalPlugin["Coherence"];

// Invoke the 'Coherence' evaluation function directly
var result = await kernel.InvokeAsync(coherenceFunction, arguments);

Console.WriteLine($"Coherence score: {result.GetValue<int>()}");

You can replace "Coherence" with any other built-in evaluation function name. The plugin name defaults to EvalPlugin unless specified otherwise.

Features

Define evaluation functions using prompt templates: You can define evaluation functions using prompt templates written in YAML.
Execute evaluations: The EvalService provides methods for executing evaluations on input data.
Score Plus Explanation: Get detailed explanations and chain-of-thought reasoning along with scores.
Aggregate results: The EvalService can aggregate evaluation scores across multiple inputs.
Built-in evaluation functions: Pre-defined functions for common evaluation metrics.
Logprobs-based scoring: Leverages logprobs for a more granular and precise evaluation score.

Product	Compatible and additional computed target framework versions.
.NET	net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net8.0
- Microsoft.SemanticKernel (>= 1.50.0)
- Microsoft.SemanticKernel.Yaml (>= 1.50.0)

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.1.0	118	5/18/2025
1.0.8	227	5/16/2025
1.0.7	216	5/16/2025
1.0.6	163	4/28/2025
1.0.5	155	4/28/2025
1.0.4	146	1/8/2025
1.0.3	118	1/3/2025
1.0.2	120	12/22/2024
1.0.1	109	12/10/2024
1.0.0-preview	123	10/22/2024
0.1.0-beta	103	10/22/2024
0.0.3-beta	112	10/21/2024
0.0.2-beta	107	9/7/2024
0.0.1-beta	97	9/7/2024