Skip to content

Conversation

@abrookins
Copy link
Collaborator

Various improvements to OpenTelemetry usage.

- Create centralized tracing module with span categories (llm, tool, graph_node, agent, knowledge, redis)
- Add Redis instrumentation hooks to tag spans with command type and infrastructure flag
- Add Grafana dashboard for agent traces with TraceQL filter examples
- Update observability docs with TraceQL queries to filter out Redis noise
- Add comprehensive unit tests for tracing module
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the observability infrastructure of the Redis SRE Agent by centralizing OpenTelemetry tracing configuration and adding server-side filtering for instance queries.

Key Changes:

  • Centralized OpenTelemetry tracing setup with custom Redis hooks to enable span filtering and reduce noise from infrastructure commands
  • Added server-side instance querying with pagination and filtering by environment, usage, status, instance_type, user_id, and name search
  • Updated all API endpoints and frontend components to use the new paginated instance list response format

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
redis_sre_agent/observability/tracing.py New centralized tracing module with SpanCategory enum, Redis hooks for filtering, and reusable decorators for consistent span attributes
redis_sre_agent/core/instances.py Added query_instances function with server-side filtering and pagination; optimized get_instance_by_id/name to use direct lookups
redis_sre_agent/api/instances.py Updated list_instances endpoint to accept filter parameters and return paginated InstanceListResponse
redis_sre_agent/api/threads.py Updated to read messages from Thread.messages (primary storage) and added latest_message preview
redis_sre_agent/api/app.py Simplified tracing setup by delegating to centralized module
redis_sre_agent/cli/worker.py Simplified worker tracing setup by delegating to centralized module
redis_sre_agent/mcp_server/server.py Updated redis_sre_list_instances tool to support filtering parameters
ui/src/services/sreAgentApi.ts Added ListInstancesParams and InstanceListResponse interfaces; updated listInstances to accept optional parameters
ui/src/pages/*.tsx Updated all UI pages to handle new InstanceListResponse format with instances array and pagination metadata
tests/unit/observability/test_tracing.py Comprehensive test coverage for new tracing module (384 lines)
tests/unit/api/test_instances_api.py Updated tests to use query_instances and new response format; added filter parameter tests
tests/unit/mcp_server/test_mcp_server.py Updated tests to use query_instances and InstanceQueryResult; added filter parameter tests
tests/unit/api/test_threads_list_message_count.py Updated to use Thread.messages instead of context["messages"]
monitoring/grafana/provisioning/dashboards/json/agent-traces.json New Grafana dashboard with trace filtering by span category and LLM metrics
docs/operations/observability.md Added documentation for span categories, Redis attributes, TraceQL queries, and new dashboard

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants