-
Notifications
You must be signed in to change notification settings - Fork 50
Open
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request
Description
It could be clearer how to use the evaluators that use "context" in addition to input and output in the Eval run, such as Faithfulness and ContextRelevancy.
Right now, I'm including contexts in the metadata. I only figured this out after few hours of poking around since the behavior is undocumented.
Here's an annotated version of my code which worked:
import { Eval } from "braintrust";
import { Faithfulness, ContextRelevancy } from "autoevals";
import "dotenv/config";
import { strict as assert } from "assert";
assert(process.env.OPENAI_OPENAI_API_KEY, "need openai key from openai");
const openAiApiKey = process.env.OPENAI_OPENAI_API_KEY;
const model = "gpt-4o-mini";
const evaluatorLlmConf = {
openAiApiKey,
model,
};
/**
Evaluate whether the output is faithful to the model input.
*/
const makeAnswerFaithfulness = function (args: {
input: string;
output: string;
// passing context in metadata
metadata: { context: string[] };
}) {
return Faithfulness({
input: args.input,
output: args.output,
context: args.metadata.context,
...evaluatorLlmConf,
});
};
/**
Evaluate whether answer is relevant to the input.
*/
const makeAnswerRelevance = function (args: {
input: string;
output: string;
metadata: { context: string[] };
}) {
return AnswerRelevancy({
input: args.input,
output: args.output,
context: args.metadata.context,
...evaluatorLlmConf,
});
};
/**
Evaluate whether context is relevant to the input.
*/
const makeContextRelevance = function (args: {
input: string;
output: string;
metadata: { context: string[] };
}) {
return ContextRelevancy({
input: args.input,
output: args.output,
context: args.metadata.context,
...evaluatorLlmConf,
});
};
const dataset = [
{
input: "What is the capital of France",
tags: ["paris"],
metadata: {
// including context in metadata here as well
context: [
"The capital of France is Paris.",
"Berlin is the capital of Germany.",
],
},
output: "Paris is the capital of France.",
},
{
input: "Who wrote Harry Potter",
tags: ["harry-potter"],
metadata: {
context: [
"Harry Potter was written by J.K. Rowling.",
"The Lord of the Rings was written by J.R.R. Tolkien.",
],
},
output: "J.R.R. Tolkien wrote Harry Potter.",
},
{
input: "What is the largest planet in our solar system",
tags: ["jupiter"],
metadata: {
context: [
"Jupiter is the largest planet in our solar system.",
"Saturn has the largest rings in our solar system.",
],
},
output: "Saturn is the largest planet in our solar system.",
},
];
function makeGeneratedAnswerReturner(outputs: string[]) {
// closure over iterator
let counter = 0;
return async (_input: string) => {
counter++;
return outputs[counter - 1];
};
}
Eval("mdb-test", {
experimentName: "rag-metrics",
metadata: {
testing: true,
},
data: () => {
return dataset;
},
task: makeGeneratedAnswerReturner(dataset.map((d) => d.output)),
scores: [makeAnswerFaithfulness, makeContextRelevance],
});Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request