-
Notifications
You must be signed in to change notification settings - Fork 790
feat: add health monitor provider for tracking provider status #1234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add centralized health monitoring for OM1 providers with: - Provider registration with metadata - Heartbeat tracking with configurable timeout - Error reporting with threshold-based degradation - System health summary and unhealthy provider detection - Thread-safe singleton implementation - 16 unit tests with full coverage
- Register all inputs with health monitor on orchestrator init - Send heartbeat after each successful input event - Report errors to health monitor when inputs fail - Update tests to verify health monitoring integration
Add health monitoring to: - ASRProvider (speech recognition) - VLMOpenAIProvider, VLMGeminiProvider, VLMVilaProvider (vision) - ElevenLabsTTSProvider, RivaTTSProvider, UbTtsProvider (text-to-speech) - GpsProvider (sensor) Each provider now registers with health monitor, sends heartbeats on successful operations, and reports errors when failures occur.
- Add start_monitoring() and stop_monitoring() methods to HealthMonitorProvider - Background thread periodically checks provider health status - Only logs when issues detected (no spam when healthy) - Logs ERROR for unhealthy providers (heartbeat timeout) - Logs WARNING for degraded providers (error threshold exceeded) - Integrate health monitoring into CortexRuntime (single-mode) - Integrate health monitoring into ModeCortexRuntime (multi-mode) - Add 4 new tests for monitoring functionality
Add 20 integration tests covering: - Runtime lifecycle (start/stop monitoring) - Provider lifecycle (registration, heartbeat, errors) - InputOrchestrator integration (sensor registration, events) - Realistic failure scenarios (timeout, degradation, recovery) - Background monitoring log verification - Singleton behavior across components
- Add recovery_callback parameter to register() method - Automatic recovery attempts when provider becomes unhealthy - Configurable max attempts (default: 3) and cooldown (default: 60s) - RECOVERING status during recovery attempts - Recovery state resets on successful heartbeat - Add _recover() method to ASRProvider, GpsProvider, ElevenLabsTTSProvider - Add 12 new tests for auto-recovery functionality
- Add _recover() method to VLMOpenAI, VLMGemini, VLMVila, RivaTTS, UbTts providers - Add recovery tests to all provider test files - Create new test_ub_tts_provider.py with full coverage - Add TestAutoRecoveryIntegration class with 7 integration tests - Total: 89 tests for recovery functionality
Add health monitoring with auto-recovery to: - ASRRTSPProvider: RTSP-based speech recognition - UbtechASRProvider: Ubtech robot ASR with error reporting - UbtechVLMProvider: Ubtech robot vision - VLMOpenAIRTSPProvider: OpenAI VLM with RTSP input - VLMVilaRTSPProvider: Vila VLM with RTSP input - VLMVilaZenohProvider: Vila VLM with Zenoh input - RtkProvider: RTK GPS sensor - D435Provider: Intel RealSense D435 depth camera All providers include: - Health monitor registration with recovery callbacks - Heartbeat on successful operations - Error reporting where applicable - Comprehensive test coverage (65 new tests) Total providers with health monitoring: 16 Total new tests: 65
|
Hi @0xbyt4 the idea is good, but why don't we develop with prometheus? |
|
Hi @openminddev, I will definitely give it a try. Additionally, independent of the main topic, I would like you to review the technical analysis I conducted through reverse engineering on a mobile app grid and telemetry system. More specifically, I would appreciate your feedback. Thank you. I know you are very busy. ( x article https://x.com/eyeofquantum/status/2009412169289384366?s=20 ) |
Summary
Add centralized health monitoring system with automatic recovery for tracking and healing provider status across the OM1 system.
Features
Health Monitoring
Auto-Recovery
RECOVERINGstatus during recovery attemptsIntegrated Providers (16 total)
Core Providers
Alternative/Extended Providers
Runtime Integration
How It Works
Example Output
Test Coverage
Configuration
Test plan