🚀 feat: integrate RAGAS evaluation framework #47

kevinagyeman · 2025-12-16T21:24:19Z

Description

Integrates the RAGAS (Retrieval Augmented Generation Assessment) framework for comprehensive QA pair evaluation. This PR adds a new RagasBatchMetrics block that enables automated quality assessment of generated QA pairs using industry-standard metrics.

Key Features:

4 RAGAS metrics: answer_relevancy, context_precision, context_recall, faithfulness
Multi-provider support: Gemini, OpenAI, Ollama, Anthropic
Quality flagging with configurable thresholds
Batch processing of all QA pairs in a single block

Implementation:

Created RagasBatchMetrics block with validation and normalization helpers
Added langchain dependencies for multi-provider embeddings
Added documentation (docs/ragas-evaluation.md) and example pipeline
All code quality checks passing (format, lint, typecheck, test)

Related Issue

closes 🚀 Feat: init RAGAS integration #33

Checklist

Code follows project style guidelines
Comments explain "why" not "what"
Documentation updated (if needed)
No debug code or console statements
make format passes
make pre-merge passes
PR update from develop branch
Copilot review run and addressed

…block

…integration-block

…block

…dencies

Copilot

Pull request overview

This PR integrates the RAGAS (Retrieval Augmented Generation Assessment) framework to provide automated quality evaluation of generated QA pairs. The implementation adds a new RagasBatchMetrics block that calculates four industry-standard metrics (answer_relevancy, faithfulness, context_precision, context_recall) with support for multiple LLM providers.

Key changes:

New RagasBatchMetrics block for batch QA pair evaluation with configurable metrics and quality thresholds
Enhanced block configuration system to support array-type enums for multi-select UI components
New JSONFieldExtractorBlock utility for extracting nested JSON fields
Frontend improvements for multi-select enums and null-safe text formatting

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
lib/blocks/builtin/ragas_batch_metrics.py	Core implementation of RAGAS evaluation block with multi-provider LLM/embedding support
lib/blocks/builtin/json_field_extractor.py	Utility block for extracting and flattening nested JSON structures
lib/blocks/config.py	Enhanced to support enum constraints on array item types for multi-select UIs
lib/workflow.py	Added wildcard output validation support for blocks with dynamic outputs
tests/blocks/test_ragas_batch_metrics.py	Comprehensive test suite for RagasBatchMetrics validation and normalization
pyproject.toml	Added ragas and langchain dependencies for evaluation framework
frontend/src/components/pipeline-editor/BlockConfigPanel.tsx	Implemented multi-select checkbox UI for array-type enum fields
frontend/src/utils/format.ts	Added null/undefined handling in text truncation utility
frontend/src/pages/Generator.tsx	Added validation state tracking and improved code formatting
examples/ragas/ragas-qa-evaluation-pipeline.json	Example pipeline configuration demonstrating RAGAS integration
examples/ragas/ragas_metric_integration.md	Documentation for using the example pipeline
docs/ragas-evaluation.md	Comprehensive guide for RAGAS metrics, configuration, and best practices

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lib/blocks/builtin/ragas_batch_metrics.py

examples/ragas/ragas-qa-evaluation-pipeline.json

pyproject.toml

examples/ragas/ragas_metric_integration.md

examples/ragas/ragas-qa-evaluation-pipeline.json

lib/blocks/builtin/json_field_extractor.py

frontend/src/pages/Generator.tsx

lib/blocks/builtin/ragas_batch_metrics.py

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

frontend/src/pages/Generator.tsx

lib/blocks/builtin/ragas_batch_metrics.py

pyproject.toml

frontend/src/components/pipeline-editor/BlockConfigPanel.tsx

nicofretti · 2025-12-21T19:39:32Z

I have added the ragas block to the Q&A pipeline it raises this error:

[ERROR] lib.blocks.builtin.ragas_batch_metrics: ragas metric calculation failed for answer_relevancy in qa_pair 1: 'str' object has no attribute 'content'

Can we avoid to use langchain and instead use litellm ?

kevinagyeman and others added 15 commits November 23, 2025 13:52

feat: partial ragas intergation

e82c32f

Merge remote-tracking branch 'origin/develop' into ragas-integration-…

6a651af

…block

feat: ragas integration (partial)

f7a53ca

feat: partial ragas integration wip

5ea1e61

edit: add .worktrees/ to gitignore

a41d158

fix: theme + storybook

c814c1b

fix: bug config pipeline

a9fe6fd

edit: improve generator validator

5562d01

fix: stop job handling

20d7202

Merge remote-tracking branch 'origin/feat/replace-modals' into ragas-…

20c8d1e

…integration-block

feat: integration of answer_relevancy and context_precision metrics

918f89e

feat: aggregate multiple ragas metrics

783eadc

feat: ragas integration completion

46452fc

Merge remote-tracking branch 'origin/develop' into ragas-integration-…

157af89

…block

fix: update blocks to use BlockExecutionContext pattern and add depen…

8caaf62

…dencies

kevinagyeman requested a review from nicofretti as a code owner December 16, 2025 21:24

nicofretti requested a review from Copilot December 16, 2025 21:59

Copilot started reviewing on behalf of nicofretti December 16, 2025 22:00 View session

Copilot AI reviewed Dec 16, 2025

View reviewed changes

fix: address copilot code review feedback

ceda594

nicofretti requested a review from Copilot December 19, 2025 16:29

Copilot started reviewing on behalf of nicofretti December 19, 2025 16:29 View session

Copilot AI reviewed Dec 19, 2025

View reviewed changes

nicofretti assigned kevinagyeman and Copilot and unassigned Copilot Dec 21, 2025

nicofretti added 3 commits December 26, 2025 10:48

wip: fixing the ragas block + field mapper

d0f3601

wip: fixing ragas block

06d553c

fix: langfuse error

1f03105

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🚀 feat: integrate RAGAS evaluation framework #47

🚀 feat: integrate RAGAS evaluation framework #47

Uh oh!

kevinagyeman commented Dec 16, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nicofretti commented Dec 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

🚀 feat: integrate RAGAS evaluation framework #47

Are you sure you want to change the base?

🚀 feat: integrate RAGAS evaluation framework #47

Uh oh!

Conversation

kevinagyeman commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nicofretti commented Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kevinagyeman commented Dec 16, 2025 •

edited

Loading

nicofretti commented Dec 21, 2025 •

edited

Loading