-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Metric queries are executed in two distinct modes:
Request-scoped queries (user-facing API requests)
Batch jobs (long-running, possibly triggered via API)
Hard timeouts and cancellation must apply only to request-scoped queries. Batch jobs must be allowed to run to completion regardless of whether they are triggered internally or via an API endpoint.
Problem:
tokio::time::timeout only bounds awaiting, not internal loops
Infinite or long-running loops can leak resources in request context
Some API endpoints intentionally start batch jobs and must not be timed out
Transport layer (Axum) is not a reliable indicator of execution intent
Required behavior:
Request-scoped metric queries:
Enforced hard timeout
Cancellation-aware internal loops
Stop execution on timeout or client disconnect
Surface Error::Timeout
Batch jobs:
No hard timeout by default
Allowed long execution
Explicitly opt into batch mode
Must not inherit request cancellation
Proposed approach:
Introduce explicit execution mode:
enum QueryMode {
Request { timeout: Duration },
Batch,
}
All metric query entry points require a QueryMode
Axum handlers:
Use QueryMode::Request for synchronous queries
Use QueryMode::Batch for batch-job endpoints
Internal loops:
Cancellation-aware only in Request mode
Acceptance criteria:
Timeouts are applied based on execution mode, not API transport
Request-scoped queries cannot outlive their timeout
Batch jobs run without request timeouts, even when API-triggered
Infinite loops terminate promptly when request mode is cancelled
Timeout duration is centrally configurable
Timeout errors surface as Error::Timeout
Priority: High
Type: Reliability / Performance
Labels: bug, reliability, performance