-
Notifications
You must be signed in to change notification settings - Fork 1
Improve telemetry #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Create centralized tracing module with span categories (llm, tool, graph_node, agent, knowledge, redis) - Add Redis instrumentation hooks to tag spans with command type and infrastructure flag - Add Grafana dashboard for agent traces with TraceQL filter examples - Update observability docs with TraceQL queries to filter out Redis noise - Add comprehensive unit tests for tracing module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR improves the observability infrastructure of the Redis SRE Agent by centralizing OpenTelemetry tracing configuration and adding server-side filtering for instance queries.
Key Changes:
- Centralized OpenTelemetry tracing setup with custom Redis hooks to enable span filtering and reduce noise from infrastructure commands
- Added server-side instance querying with pagination and filtering by environment, usage, status, instance_type, user_id, and name search
- Updated all API endpoints and frontend components to use the new paginated instance list response format
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| redis_sre_agent/observability/tracing.py | New centralized tracing module with SpanCategory enum, Redis hooks for filtering, and reusable decorators for consistent span attributes |
| redis_sre_agent/core/instances.py | Added query_instances function with server-side filtering and pagination; optimized get_instance_by_id/name to use direct lookups |
| redis_sre_agent/api/instances.py | Updated list_instances endpoint to accept filter parameters and return paginated InstanceListResponse |
| redis_sre_agent/api/threads.py | Updated to read messages from Thread.messages (primary storage) and added latest_message preview |
| redis_sre_agent/api/app.py | Simplified tracing setup by delegating to centralized module |
| redis_sre_agent/cli/worker.py | Simplified worker tracing setup by delegating to centralized module |
| redis_sre_agent/mcp_server/server.py | Updated redis_sre_list_instances tool to support filtering parameters |
| ui/src/services/sreAgentApi.ts | Added ListInstancesParams and InstanceListResponse interfaces; updated listInstances to accept optional parameters |
| ui/src/pages/*.tsx | Updated all UI pages to handle new InstanceListResponse format with instances array and pagination metadata |
| tests/unit/observability/test_tracing.py | Comprehensive test coverage for new tracing module (384 lines) |
| tests/unit/api/test_instances_api.py | Updated tests to use query_instances and new response format; added filter parameter tests |
| tests/unit/mcp_server/test_mcp_server.py | Updated tests to use query_instances and InstanceQueryResult; added filter parameter tests |
| tests/unit/api/test_threads_list_message_count.py | Updated to use Thread.messages instead of context["messages"] |
| monitoring/grafana/provisioning/dashboards/json/agent-traces.json | New Grafana dashboard with trace filtering by span category and LLM metrics |
| docs/operations/observability.md | Added documentation for span categories, Redis attributes, TraceQL queries, and new dashboard |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Various improvements to OpenTelemetry usage.