-
Notifications
You must be signed in to change notification settings - Fork 8
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Description
Integrate Ragas evaluation framework to automatically assess the quality of generated QA datasets. This provides objective metrics to validate that generated questions and answers meet quality standards before use in production.
User Benefits
- Automatically evaluate QA dataset quality without manual review
- Get quantitative scores for answer accuracy and relevance
- Identify low-quality generated pairs that need regeneration
- Benchmark different LLM models and prompt strategies
- Ensure consistent dataset quality across pipeline runs
Requirements
- Add Ragas library dependency to project
- Implement evaluation block that accepts QA pairs with context
- Support key Ragas metrics: faithfulness, answer relevancy, context precision, context recall
- Enable batch evaluation of multiple QA pairs efficiently
- Generate structured evaluation report with per-item and aggregate scores
- Add configuration options for selecting which metrics to compute
- Support filtering/flagging low-scoring items below threshold
- Handle evaluation failures gracefully with clear error messages
- Update documentation with metric explanations and usage examples
- Add sample pipeline showing evaluation block in workflow
TODO
Define a subset of functions to integrate
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request