Skip to content

Conversation

@kevinagyeman
Copy link
Collaborator

@kevinagyeman kevinagyeman commented Dec 16, 2025

Description

Integrates the RAGAS (Retrieval Augmented Generation Assessment) framework for comprehensive QA pair evaluation. This PR adds a new RagasBatchMetrics block that enables automated quality assessment of generated QA pairs using industry-standard metrics.

Key Features:

  • 4 RAGAS metrics: answer_relevancy, context_precision, context_recall, faithfulness
  • Multi-provider support: Gemini, OpenAI, Ollama, Anthropic
  • Quality flagging with configurable thresholds
  • Batch processing of all QA pairs in a single block

Implementation:

  • Created RagasBatchMetrics block with validation and normalization helpers
  • Added langchain dependencies for multi-provider embeddings
  • Added documentation (docs/ragas-evaluation.md) and example pipeline
  • All code quality checks passing (format, lint, typecheck, test)

Related Issue

Checklist

  • Code follows project style guidelines
  • Comments explain "why" not "what"
  • Documentation updated (if needed)
  • No debug code or console statements
  • make format passes
  • make pre-merge passes
  • PR update from develop branch
  • Copilot review run and addressed

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates the RAGAS (Retrieval Augmented Generation Assessment) framework to provide automated quality evaluation of generated QA pairs. The implementation adds a new RagasBatchMetrics block that calculates four industry-standard metrics (answer_relevancy, faithfulness, context_precision, context_recall) with support for multiple LLM providers.

Key changes:

  • New RagasBatchMetrics block for batch QA pair evaluation with configurable metrics and quality thresholds
  • Enhanced block configuration system to support array-type enums for multi-select UI components
  • New JSONFieldExtractorBlock utility for extracting nested JSON fields
  • Frontend improvements for multi-select enums and null-safe text formatting

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
lib/blocks/builtin/ragas_batch_metrics.py Core implementation of RAGAS evaluation block with multi-provider LLM/embedding support
lib/blocks/builtin/json_field_extractor.py Utility block for extracting and flattening nested JSON structures
lib/blocks/config.py Enhanced to support enum constraints on array item types for multi-select UIs
lib/workflow.py Added wildcard output validation support for blocks with dynamic outputs
tests/blocks/test_ragas_batch_metrics.py Comprehensive test suite for RagasBatchMetrics validation and normalization
pyproject.toml Added ragas and langchain dependencies for evaluation framework
frontend/src/components/pipeline-editor/BlockConfigPanel.tsx Implemented multi-select checkbox UI for array-type enum fields
frontend/src/utils/format.ts Added null/undefined handling in text truncation utility
frontend/src/pages/Generator.tsx Added validation state tracking and improved code formatting
examples/ragas/ragas-qa-evaluation-pipeline.json Example pipeline configuration demonstrating RAGAS integration
examples/ragas/ragas_metric_integration.md Documentation for using the example pipeline
docs/ragas-evaluation.md Comprehensive guide for RAGAS metrics, configuration, and best practices

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nicofretti nicofretti assigned kevinagyeman and Copilot and unassigned Copilot Dec 21, 2025
@nicofretti
Copy link
Owner

nicofretti commented Dec 21, 2025

I have added the ragas block to the Q&A pipeline it raises this error:

[ERROR] lib.blocks.builtin.ragas_batch_metrics: ragas metric calculation failed for answer_relevancy in qa_pair 1: 'str' object has no attribute 'content'

Can we avoid to use langchain and instead use litellm ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants