-
Notifications
You must be signed in to change notification settings - Fork 7
🚀 feat: integrate RAGAS evaluation framework #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
…integration-block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR integrates the RAGAS (Retrieval Augmented Generation Assessment) framework to provide automated quality evaluation of generated QA pairs. The implementation adds a new RagasBatchMetrics block that calculates four industry-standard metrics (answer_relevancy, faithfulness, context_precision, context_recall) with support for multiple LLM providers.
Key changes:
- New
RagasBatchMetricsblock for batch QA pair evaluation with configurable metrics and quality thresholds - Enhanced block configuration system to support array-type enums for multi-select UI components
- New
JSONFieldExtractorBlockutility for extracting nested JSON fields - Frontend improvements for multi-select enums and null-safe text formatting
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/blocks/builtin/ragas_batch_metrics.py | Core implementation of RAGAS evaluation block with multi-provider LLM/embedding support |
| lib/blocks/builtin/json_field_extractor.py | Utility block for extracting and flattening nested JSON structures |
| lib/blocks/config.py | Enhanced to support enum constraints on array item types for multi-select UIs |
| lib/workflow.py | Added wildcard output validation support for blocks with dynamic outputs |
| tests/blocks/test_ragas_batch_metrics.py | Comprehensive test suite for RagasBatchMetrics validation and normalization |
| pyproject.toml | Added ragas and langchain dependencies for evaluation framework |
| frontend/src/components/pipeline-editor/BlockConfigPanel.tsx | Implemented multi-select checkbox UI for array-type enum fields |
| frontend/src/utils/format.ts | Added null/undefined handling in text truncation utility |
| frontend/src/pages/Generator.tsx | Added validation state tracking and improved code formatting |
| examples/ragas/ragas-qa-evaluation-pipeline.json | Example pipeline configuration demonstrating RAGAS integration |
| examples/ragas/ragas_metric_integration.md | Documentation for using the example pipeline |
| docs/ragas-evaluation.md | Comprehensive guide for RAGAS metrics, configuration, and best practices |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I have added the ragas block to the Q&A pipeline it raises this error: [ERROR] lib.blocks.builtin.ragas_batch_metrics: ragas metric calculation failed for answer_relevancy in qa_pair 1: 'str' object has no attribute 'content'Can we avoid to use |
Description
Integrates the RAGAS (Retrieval Augmented Generation Assessment) framework for comprehensive QA pair evaluation. This PR adds a new
RagasBatchMetricsblock that enables automated quality assessment of generated QA pairs using industry-standard metrics.Key Features:
Implementation:
RagasBatchMetricsblock with validation and normalization helpersdocs/ragas-evaluation.md) and example pipelineRelated Issue
Checklist
make formatpassesmake pre-mergepasses