Skip to content

🚀 Feat: init RAGAS integration #33

@nicofretti

Description

@nicofretti

Description

Integrate Ragas evaluation framework to automatically assess the quality of generated QA datasets. This provides objective metrics to validate that generated questions and answers meet quality standards before use in production.

User Benefits

  • Automatically evaluate QA dataset quality without manual review
  • Get quantitative scores for answer accuracy and relevance
  • Identify low-quality generated pairs that need regeneration
  • Benchmark different LLM models and prompt strategies
  • Ensure consistent dataset quality across pipeline runs

Requirements

  • Add Ragas library dependency to project
  • Implement evaluation block that accepts QA pairs with context
  • Support key Ragas metrics: faithfulness, answer relevancy, context precision, context recall
  • Enable batch evaluation of multiple QA pairs efficiently
  • Generate structured evaluation report with per-item and aggregate scores
  • Add configuration options for selecting which metrics to compute
  • Support filtering/flagging low-scoring items below threshold
  • Handle evaluation failures gracefully with clear error messages
  • Update documentation with metric explanations and usage examples
  • Add sample pipeline showing evaluation block in workflow

TODO

Define a subset of functions to integrate

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions