Adding online agent evaluation sample notebooks for all evaluators #286

MohamedRadyA · 2025-11-13T13:19:09Z

Agent Online Evaluation Notebooks

This directory contains comprehensive sample notebooks for evaluating AI agents using the Azure AI Projects SDK with online evaluation capabilities. These notebooks demonstrate how to assess various aspects of agent performance and response quality using modern batch evaluation patterns.

Overview

The Agent Online Evaluation notebooks provide complete coverage of Azure AI evaluation capabilities specifically designed for agent scenarios. Each notebook uses the modern Azure AI Projects SDK with standardized SourceFileContentContent(item={}) format for consistent and maintainable evaluation workflows.

Evaluators Included (14 Total)

Agent-Specific Evaluators

Intent Resolution - Evaluates how well an agent identifies and addresses user intent
Task Adherence - Assesses adherence to task requirements and constraints
Task Completion - Measures whether agent tasks are fully completed
Task Navigation Efficiency - Evaluates efficiency in multi-step task execution
Tool Call Accuracy - Validates correctness of tool call parameters
Tool Selection - Assesses appropriateness and completeness of tool choices
Tool Success - Determines technical success of tool executions
Tool Input Accuracy - Ensures strict parameter compliance with tool definitions
Tool Output Utilization - Evaluates effective use of tool outputs in responses

Quality Evaluators

Coherence - Measures natural flow, readability, and logical progression of responses
Fluency - Assesses linguistic quality, grammar, and syntax
Groundedness - Validates responses are grounded in provided context
Relevance - Evaluates relevance of responses to user queries
Response Completeness - Compares responses against ground truth expectations

Key Features

Modern SDK Integration: Uses azure-ai-projects with AIProjectClient and OpenAI evals API
Standardized Format: Consistent SourceFileContentContent(item={}) format across all notebooks
Complete Type Coverage: All Union type variants demonstrated (str vs List[dict], dict vs List[dict])
Batch Evaluation: Efficient single evaluation run for multiple test cases using run_evaluator helper
Practical Examples: Real-world scenarios covering various quality levels and edge cases
Flexible Data Sources: Support for both string and array inputs with anyOf schema patterns

Each notebook includes detailed documentation, prerequisite setup, scoring system explanation, comprehensive samples demonstrating all input type variants, and batch evaluation examples.

Checklist

I have read the contribution guidelines
I have coordinated with the docs team (mldocs@microsoft.com) if this PR deletes files or changes any file names or file extensions.
This notebook or file is added to the CODEOWNERS file, pointing to the author or the author's team.

…or to online evaluation

MohamedRadyA added 7 commits November 3, 2025 20:56

Adding samples for all new evaluators

48672e2

Updating evaluators and adding new evaluator

cf1e564

Removing useless samples

dedfd9e

Updating response completeness and task navigation efficiency evaluators

cda9038

Adding data source configs and data mapping for all evaluators

7522b19

Added all initialization parameters and migrated tool success evaluat…

449f31a

…or to online evaluation

Modifying all sample notebook to support online evaluation lib

ac409ca

MohamedRadyA requested a review from a team as a code owner November 13, 2025 13:19

MohamedRadyA had a problem deploying to external-contribution November 13, 2025 13:19 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding online agent evaluation sample notebooks for all evaluators #286

Adding online agent evaluation sample notebooks for all evaluators #286

Uh oh!

MohamedRadyA commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Adding online agent evaluation sample notebooks for all evaluators #286

Are you sure you want to change the base?

Adding online agent evaluation sample notebooks for all evaluators #286

Uh oh!

Conversation

MohamedRadyA commented Nov 13, 2025

Agent Online Evaluation Notebooks

Overview

Evaluators Included (14 Total)

Agent-Specific Evaluators

Quality Evaluators

Key Features

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant