DeepEvalEx

LLM evaluation framework for Elixir - Idiomatic + Compatible Elixir port of DeepEval.

Attribution: This project is a derivative work of DeepEval by Confident AI, licensed under Apache 2.0. The core evaluation algorithms, metrics, and prompt templates are derived from the original Python implementation.

Installation

Add deep_eval_ex to your list of dependencies in mix.exs:

def deps do
  [
    {:deep_eval_ex, "~> 0.1.0"}
  ]
end

Quick Start

# Create a test case
test_case = DeepEvalEx.TestCase.new!(
  input: "What is the capital of France?",
  actual_output: "The capital of France is Paris.",
  expected_output: "Paris"
)

# Evaluate with ExactMatch metric
{:ok, result} = DeepEvalEx.Metrics.ExactMatch.measure(test_case)

# Check result
result.score      # => 0.0 (not an exact match)
result.success    # => false
result.reason     # => "The actual and expected outputs are different."

Configuration

Configure your LLM provider in config/config.exs:

config :deep_eval_ex,
  default_model: {:openai, "gpt-4o-mini"},
  openai_api_key: System.get_env("OPENAI_API_KEY"),
  default_threshold: 0.5

Available Metrics

Metric	Purpose
ExactMatch	Simple string comparison
GEval	Flexible criteria-based evaluation using LLM-as-judge
Faithfulness	RAG: claims supported by retrieval context
Hallucination	Detects unsupported statements
AnswerRelevancy	Response relevance to input question
ContextualPrecision	RAG retrieval ranking quality
ContextualRecall	RAG coverage of ground truth

See the Metrics Overview for detailed documentation on each metric.

Documentation

Guide	Description
Quick Start	Get up and running in 5 minutes
Configuration	LLM provider setup and options
Metrics Overview	All available metrics explained
ExUnit Integration	Test assertions for CI/CD
Custom Metrics	Build your own evaluation metrics
Telemetry	Observability and monitoring

API Reference

TestCase - Test case structure
Result - Evaluation results
Evaluator - Batch evaluation
LLM Adapters - Provider adapters

Architecture

Architecture Decision Records - Design decisions and rationale

LLM Adapters

DeepEvalEx supports multiple LLM providers:

OpenAI - GPT-4o, GPT-4o-mini, GPT-3.5-turbo
Anthropic - Claude 3 family (planned)
Ollama - Local models (planned)

See LLM Adapters and Custom LLM Adapters for details.

Usage with ExUnit

defmodule MyApp.LLMTest do
  use ExUnit.Case

  alias DeepEvalEx.{TestCase, Metrics}

  test "LLM generates accurate responses" do
    test_case = TestCase.new!(
      input: "What is 2 + 2?",
      actual_output: get_llm_response("What is 2 + 2?"),
      expected_output: "4"
    )

    {:ok, result} = Metrics.ExactMatch.measure(test_case)
    assert result.success, result.reason
  end
end

Concurrent Evaluation

Evaluate multiple test cases concurrently:

test_cases = [
  TestCase.new!(input: "Q1", actual_output: "A1", expected_output: "A1"),
  TestCase.new!(input: "Q2", actual_output: "A2", expected_output: "A2")
]

results = DeepEvalEx.evaluate_batch(test_cases, [Metrics.ExactMatch],
  concurrency: 20
)

Telemetry

DeepEvalEx emits telemetry events for observability:

:telemetry.attach("my-handler", [:deep_eval_ex, :metric, :stop], fn _event, measurements, metadata, _config ->
  IO.puts("Metric #{metadata.metric} completed with score #{measurements.score}")
end, nil)

See Telemetry Guide for all events and integration patterns.

License

Apache 2.0 - See LICENSE and NOTICE for details.

This project is a derivative work of DeepEval by Confident AI, also licensed under Apache 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
config		config
docs/adr		docs/adr
lib		lib
test		test
wiki		wiki
.formatter.exs		.formatter.exs
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
ROADMAP.md		ROADMAP.md
deepevalex.png		deepevalex.png
deepevalex_text.png		deepevalex_text.png
mix.exs		mix.exs
mix.lock		mix.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DeepEvalEx

Installation

Quick Start

Configuration

Available Metrics

Documentation

API Reference

Architecture

LLM Adapters

Usage with ExUnit

Concurrent Evaluation

Telemetry

License

About

Uh oh!

Releases 1

Packages

Languages

License

holsee/deep_eval_ex

Folders and files

Latest commit

History

Repository files navigation

DeepEvalEx

Installation

Quick Start

Configuration

Available Metrics

Documentation

API Reference

Architecture

LLM Adapters

Usage with ExUnit

Concurrent Evaluation

Telemetry

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages