EvoLoop is a framework-agnostic Python library designed to bring Self-Evolving capabilities to any AI Agent or LLM workflow.
Unlike other frameworks that focus on building agents (like LangChain, CrewAI, or Agno), EvoLoop focuses exclusively on evaluating and optimizing them. It acts as a "gym" for your agents, providing tools to capture interactions, evaluate performance, and learn from mistakes.
- Framework Agnostic: Works with LangChain, LangGraph, AutoGen, raw OpenAI API, or any other stack
- Zero Configuration: Just add a decorator and start capturing traces
- Async Support: Works seamlessly with both sync and async functions
- Robust Serialization: Handles Pydantic models, dataclasses, LangChain messages automatically
- Fail-Safe: Tracing errors are logged but never crash your application
- Lightweight: No heavy dependencies, SQLite storage by default
- Multiple Integration Modes: Decorator, wrapper, or manual logging
pip install evoloopOr install from source:
git clone https://github.com/tostechbr/evoloop.git
cd evoloop
pip install -e .from evoloop import monitor
@monitor
def my_agent(question: str) -> str:
# Your agent logic here
return "Agent response"
# Use as normal - traces are captured automatically
response = my_agent("What is the capital of France?")from evoloop import monitor
import asyncio
@monitor(name="async_agent")
async def my_async_agent(question: str) -> str:
# Your async agent logic
await asyncio.sleep(0.1)
return "Async response"
# Works seamlessly with async
response = await my_async_agent("What is 2+2?")from evoloop import wrap
from langgraph.prebuilt import create_react_agent
agent = create_react_agent(model, tools)
monitored_agent = wrap(agent, name="my_agent")
# Use as normal
result = monitored_agent.invoke({"messages": [...]})from evoloop import log
# After your agent runs
trace = log(
input_data=user_question,
output_data=agent_response,
metadata={"user_id": "123"}
)from evoloop import get_storage
storage = get_storage()
# Get recent traces
traces = storage.list_traces(limit=10)
for trace in traces:
print(f"[{trace.status}] {trace.input[:50]}...")
# Count by status
print(f"Total: {storage.count()}")
print(f"Errors: {storage.count(status='error')}")Attach context data for evaluation against business rules:
from evoloop import monitor
from evoloop.tracker import set_context
from evoloop.types import TraceContext
@monitor
def debt_agent(user_message: str, customer_data: dict) -> str:
# Attach API data as context
set_context(TraceContext(
data=customer_data,
source="customer_api"
))
# Agent logic...
return response- Phase 1: Tracker Module (capture traces) ✅ v0.2.0 - Production Ready
- Sync and async function support
- Robust serialization (Pydantic, dataclasses, LangChain)
- Fail-safe storage (errors logged, never raised)
- Phase 2: Judge Module (binary evaluation)
- Phase 3: Reporter Module (error taxonomy)
- Phase 4: CLI (
evoloop eval,evoloop report) - Phase 5: Self-Evolution (prompt optimization)
EvoLoop is inspired by the principles in "LLM Evals: Everything You Need to Know" by Hamel Husain:
- Binary evaluations (Pass/Fail) over Likert scales (1-5)
- Error analysis as the core of improvement
- Domain-specific criteria over generic metrics
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Type checking
mypy src/evoloop
# Linting
ruff check src/MIT License - see LICENSE for details.