Skip to content

Add hard timeout to all metric queries #20

@songk1992

Description

@songk1992

Metric queries are executed in two distinct modes:

Request-scoped queries (user-facing API requests)

Batch jobs (long-running, possibly triggered via API)

Hard timeouts and cancellation must apply only to request-scoped queries. Batch jobs must be allowed to run to completion regardless of whether they are triggered internally or via an API endpoint.

Problem:

tokio::time::timeout only bounds awaiting, not internal loops

Infinite or long-running loops can leak resources in request context

Some API endpoints intentionally start batch jobs and must not be timed out

Transport layer (Axum) is not a reliable indicator of execution intent

Required behavior:

Request-scoped metric queries:

Enforced hard timeout

Cancellation-aware internal loops

Stop execution on timeout or client disconnect

Surface Error::Timeout

Batch jobs:

No hard timeout by default

Allowed long execution

Explicitly opt into batch mode

Must not inherit request cancellation

Proposed approach:

Introduce explicit execution mode:

enum QueryMode {
Request { timeout: Duration },
Batch,
}

All metric query entry points require a QueryMode

Axum handlers:

Use QueryMode::Request for synchronous queries

Use QueryMode::Batch for batch-job endpoints

Internal loops:

Cancellation-aware only in Request mode

Acceptance criteria:

Timeouts are applied based on execution mode, not API transport

Request-scoped queries cannot outlive their timeout

Batch jobs run without request timeouts, even when API-triggered

Infinite loops terminate promptly when request mode is cancelled

Timeout duration is centrally configurable

Timeout errors surface as Error::Timeout

Priority: High
Type: Reliability / Performance
Labels: bug, reliability, performance

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions