Skip to content

Conversation

@nikhilsinhaparseable
Copy link
Contributor

@nikhilsinhaparseable nikhilsinhaparseable commented Dec 30, 2025

add below metrics -

  1. total logs collected by date
  2. total size of logs collected by date
  3. total metrics collected by date
  4. total size of metrics collected by date
  5. total traces collected by date
  6. total size of traces collected by date

add all of these to pbilling

Summary by CodeRabbit

  • New Features

    • Added daily, per-type metrics for logs, metrics, and traces (counts and sizes) to improve usage and billing visibility
    • Events now include telemetry type metadata for clearer classification
  • Refactor

    • Propagated telemetry type through ingestion and processing so telemetry context is preserved end-to-end
  • Bug Fixes / Improvements

    • Emit new per-date billing metrics during aggregation and processing paths for more accurate reporting

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 30, 2025

Walkthrough

TelemetryType was threaded through ingestion and event construction: Event gained a public telemetry_type field, EventFormat::into_event signature was extended, ingestion/OTEL handlers now pass TelemetryType (Logs/Metrics/Traces), and metrics gained per-type per-date counters and incrementers.

Changes

Cohort / File(s) Summary
Event Structure & Format
src/event/mod.rs, src/event/format/mod.rs, src/event/format/json.rs
Added pub telemetry_type: TelemetryType to Event; updated EventFormat::into_event to accept telemetry_type: TelemetryType and propagate it into constructed Event.
Ingestion Flow & Utils
src/handlers/http/ingest.rs, src/handlers/http/modal/utils/ingest_utils.rs
Threaded telemetry_type through OTEL processing and ingestion entry points: process_otel_content, flatten_and_push_logs, and push_logs signatures now take telemetry_type; OTEL handlers call with Logs/Metrics/Traces; unchecked/internal pushes set TelemetryType::Logs.
Metrics Infrastructure
src/metrics/mod.rs
Added TelemetryType import; introduced per-team+date metrics (TOTAL_*_BY_DATE and *_SIZE_BY_DATE); changed increment_events_ingested_size_by_date to accept telemetry_type and route updates to type-specific metrics; added/modified helpers increment_metrics_collected_by_date(count, date), increment_logs_collected_by_date, increment_traces_collected_by_date.
OTEL Processing
src/otel/logs.rs, src/otel/metrics.rs, src/otel/traces.rs
Emit per-date collection metrics: logs and traces now call per-date incrementers with counts and date; metrics moved batching to call increment_metrics_collected_by_date(count, date) at resource/batch level.
Billing & Cluster Aggregation
src/handlers/http/cluster/mod.rs
BillingMetricsCollector extended with new per-date maps for metrics/logs/traces counts and sizes; into_events emits corresponding per-date billing metrics when maps non-empty; parsing/classification updated to populate new maps.
Storage & Pipelines
src/storage/field_stats.rs, src/connectors/kafka/processor.rs
Event construction calls updated to pass explicit TelemetryType (e.g., TelemetryType::Logs), replacing previous defaults; field_stats uses TelemetryType::Logs for dataset-stat events.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Ingest as Ingestion Handlers
    participant OTEL as OTEL Router/Parser
    participant Proc as Flatten/Push Processor
    participant Storage as Event Storage
    participant Metrics as Metrics System

    Client->>Ingest: POST /ingest (payload)
    activate Ingest

    Ingest->>OTEL: process_otel_content(body, telemetry_type)
    activate OTEL
    note right of OTEL: Determine TelemetryType (Logs/Metrics/Traces)
    OTEL-->>Proc: flatten_and_push_logs(..., telemetry_type)
    deactivate OTEL

    activate Proc
    note right of Proc: Build Event(s) with telemetry_type
    Proc->>Storage: push events (include telemetry_type)
    note right of Proc: increment_[logs|metrics|traces]_collected_by_date(count, date)
    deactivate Proc

    activate Storage
    Storage->>Metrics: increment_events_ingested_size_by_date(size, date, telemetry_type)
    deactivate Storage

    activate Metrics
    note right of Metrics: Route increments to type-specific\nTOTAL_{LOGS|METRICS|TRACES}_COLLECTED[_SIZE]_BY_DATE
    deactivate Metrics

    deactivate Ingest
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • PR #1392: Introduced the TelemetryType enum and initial threading of telemetry_type across ingestion/event construction.
  • PR #1228: Modified Event construction API (EventFormat::into_event) and related call sites; closely related to the signature changes here.
  • PR #1177: Refactored OTEL ingestion paths (flatten_and_push_logs / push_logs); related to the telemetry_type propagation in ingestion utils.

Suggested labels

for next release

Suggested reviewers

  • de-sh
  • parmesant

Poem

🐇 I hopped through code and stitched a type,
Logs, metrics, traces—each gets a stripe.
Events now wear labels neat,
Counts and sizes keep the beat,
A bunny's cheer for telemetry ripe! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 67.74% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ❓ Inconclusive The description lists the six new metrics being added but lacks detail on implementation approach, testing status, and documentation of the feature changes. Complete the description template by including implementation rationale, confirmation of testing completion, and documentation updates.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main objective: adding separate metrics collection for logs, traces, and metrics by date.
✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/metrics/mod.rs (1)

379-449: Clarify the purpose of the "team" label with hardcoded "all" value.

All new metrics use ["team", "date"] labels, but the helper functions always pass "all" as the team value (e.g., line 738, 744, 750). This pattern suggests either:

  1. The "team" dimension is intended for future multi-tenancy but currently unused
  2. The label name might be more appropriately named (e.g., "scope" or "aggregate")

If "team" is a placeholder for future functionality, consider adding a brief comment. If it's for aggregation purposes, the label name could be misleading. Based on learnings, similar global metrics use ["format", "date"] labels for aggregation.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f3f4f67 and b86629a.

📒 Files selected for processing (11)
  • src/event/format/json.rs
  • src/event/format/mod.rs
  • src/event/mod.rs
  • src/handlers/http/cluster/mod.rs
  • src/handlers/http/ingest.rs
  • src/handlers/http/modal/utils/ingest_utils.rs
  • src/metrics/mod.rs
  • src/otel/logs.rs
  • src/otel/metrics.rs
  • src/otel/traces.rs
  • src/storage/field_stats.rs
🧰 Additional context used
🧠 Learnings (17)
📓 Common learnings
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
📚 Learning: 2025-08-18T19:10:11.941Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/handlers/http/ingest.rs:163-164
Timestamp: 2025-08-18T19:10:11.941Z
Learning: Field statistics calculation in src/storage/field_stats.rs uses None for the time_partition parameter when calling flatten_and_push_logs(), as field stats generation does not require time partition functionality.

Applied to files:

  • src/storage/field_stats.rs
  • src/otel/logs.rs
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.

Applied to files:

  • src/storage/field_stats.rs
  • src/otel/logs.rs
  • src/handlers/http/modal/utils/ingest_utils.rs
  • src/otel/metrics.rs
  • src/event/mod.rs
  • src/handlers/http/cluster/mod.rs
  • src/otel/traces.rs
  • src/metrics/mod.rs
📚 Learning: 2025-08-25T01:32:25.980Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.

Applied to files:

  • src/storage/field_stats.rs
  • src/otel/logs.rs
  • src/otel/metrics.rs
  • src/event/mod.rs
  • src/handlers/http/cluster/mod.rs
  • src/otel/traces.rs
  • src/metrics/mod.rs
📚 Learning: 2025-09-18T09:59:20.177Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:700-756
Timestamp: 2025-09-18T09:59:20.177Z
Learning: In src/event/mod.rs, the parsed_timestamp used in increment_events_ingested_by_date() is correctly UTC-normalized: for dynamic streams it remains Utc::now(), and for streams with time partition enabled it uses the time partition value. Both cases result in proper UTC date strings for metrics labeling, preventing double-counting issues.

Applied to files:

  • src/storage/field_stats.rs
  • src/otel/logs.rs
  • src/otel/metrics.rs
  • src/event/mod.rs
  • src/handlers/http/cluster/mod.rs
  • src/otel/traces.rs
  • src/handlers/http/ingest.rs
  • src/metrics/mod.rs
📚 Learning: 2025-10-20T17:48:53.444Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1448
File: src/handlers/http/cluster/mod.rs:1370-1400
Timestamp: 2025-10-20T17:48:53.444Z
Learning: In src/handlers/http/cluster/mod.rs, the billing metrics processing logic should NOT accumulate counter values from multiple Prometheus samples with the same labels. The intended behavior is to convert each received counter from nodes into individual events for ingestion, using `.insert()` to store the counter value directly.

Applied to files:

  • src/otel/logs.rs
  • src/otel/metrics.rs
  • src/event/mod.rs
  • src/handlers/http/cluster/mod.rs
  • src/metrics/mod.rs
📚 Learning: 2025-08-18T17:59:31.642Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/handlers/http/modal/utils/ingest_utils.rs:149-156
Timestamp: 2025-08-18T17:59:31.642Z
Learning: The time_partition parameter in push_logs() function in src/handlers/http/modal/utils/ingest_utils.rs is determined by the caller (flatten_and_push_logs). OSS callers pass None, enterprise callers pass the appropriate value (None or Some<>), and OTEL callers always pass None. The push_logs() function should not add additional time partition logic since it's already handled at the caller level.

Applied to files:

  • src/handlers/http/modal/utils/ingest_utils.rs
  • src/handlers/http/ingest.rs
📚 Learning: 2025-03-26T06:44:53.362Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.

Applied to files:

  • src/handlers/http/modal/utils/ingest_utils.rs
📚 Learning: 2025-08-18T12:37:47.732Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/parseable/mod.rs:528-533
Timestamp: 2025-08-18T12:37:47.732Z
Learning: In Parseable, the validate_time_partition function in src/utils/json/flatten.rs already provides a default time partition limit of 30 days using `map_or(30, |days| days.get() as i64)` when time_partition_limit is None, so no additional defaulting is needed in the stream creation logic in src/parseable/mod.rs.

Applied to files:

  • src/handlers/http/modal/utils/ingest_utils.rs
📚 Learning: 2025-08-18T14:48:46.324Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/handlers/http/ingest.rs:256-258
Timestamp: 2025-08-18T14:48:46.324Z
Learning: OTEL streams (OtelLogs, OtelMetrics, OtelTraces) in Parseable are intentionally designed to not support time partition functionality. The time_partition parameter should always be None for OTEL ingestion paths in flatten_and_push_logs calls.

Applied to files:

  • src/event/format/json.rs
📚 Learning: 2025-09-18T09:52:07.554Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/storage/object_storage.rs:173-177
Timestamp: 2025-09-18T09:52:07.554Z
Learning: In Parseable's upload system (src/storage/object_storage.rs), the update_storage_metrics function can safely use path.metadata().map_err() to fail on local file metadata read failures because parquet validation (validate_uploaded_parquet_file) ensures file integrity before this step, and the system guarantees local staging files remain accessible throughout the upload flow.

Applied to files:

  • src/event/mod.rs
📚 Learning: 2025-09-25T07:12:33.401Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1441
File: src/storage/azure_blob.rs:727-735
Timestamp: 2025-09-25T07:12:33.401Z
Learning: In Parseable's object storage implementations, metrics should only be captured when operations succeed, not when they are attempted. The increment_object_store_calls_by_date() calls should be placed after the await? to ensure they only execute on successful operations.

Applied to files:

  • src/handlers/http/cluster/mod.rs
📚 Learning: 2025-09-25T07:12:40.189Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1441
File: src/storage/gcs.rs:625-633
Timestamp: 2025-09-25T07:12:40.189Z
Learning: In Parseable's object storage metrics system, metrics should only be captured when API calls succeed, not when they error out. The current pattern of calling increment_object_store_calls_by_date and related metrics functions after the `?` operator is the correct approach.

Applied to files:

  • src/handlers/http/cluster/mod.rs
📚 Learning: 2025-09-25T07:12:27.407Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1441
File: src/storage/azure_blob.rs:693-700
Timestamp: 2025-09-25T07:12:27.407Z
Learning: In the Parseable codebase, object store metrics (increment_object_store_calls_by_date, increment_files_scanned_in_object_store_calls_by_date, etc.) should only be recorded for successful operations, not for failed attempts. The metric calls should be placed after error handling operators like `?` to ensure they only execute when operations succeed.

Applied to files:

  • src/handlers/http/cluster/mod.rs
📚 Learning: 2025-08-18T12:33:16.085Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/storage/object_storage.rs:145-166
Timestamp: 2025-08-18T12:33:16.085Z
Learning: In Parseable's staging and upload process, parquet file names are guaranteed to always contain the date part in the expected format (date=YYYY-MM-DD). The system ensures no deviation from this naming convention, making defensive parsing unnecessary for date extraction from filenames during storage metrics updates.

Applied to files:

  • src/handlers/http/cluster/mod.rs
📚 Learning: 2025-08-20T17:01:25.791Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1409
File: src/storage/field_stats.rs:429-456
Timestamp: 2025-08-20T17:01:25.791Z
Learning: In Parseable's field stats calculation (src/storage/field_stats.rs), the extract_datetime_from_parquet_path_regex function correctly works with filename-only parsing because Parseable's server-side filename generation guarantees the dot-separated format date=YYYY-MM-DD.hour=HH.minute=MM pattern in parquet filenames.

Applied to files:

  • src/handlers/http/cluster/mod.rs
📚 Learning: 2025-08-18T18:01:22.834Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/handlers/http/modal/utils/ingest_utils.rs:271-292
Timestamp: 2025-08-18T18:01:22.834Z
Learning: In Parseable's ingestion validation, validate_stream_for_ingestion is designed to prevent regular log ingestion endpoints (ingest() and post_event()) from ingesting into streams that exclusively contain OTEL traces or metrics. The function allows mixed streams (regular logs + OTEL) but blocks ingestion into OTEL-only streams, maintaining architectural separation between regular log and OTEL ingestion pathways.

Applied to files:

  • src/handlers/http/ingest.rs
🧬 Code graph analysis (7)
src/otel/logs.rs (1)
src/metrics/mod.rs (1)
  • increment_logs_collected_by_date (742-746)
src/otel/metrics.rs (1)
src/metrics/mod.rs (1)
  • increment_metrics_collected_by_date (736-740)
src/event/mod.rs (1)
src/metrics/mod.rs (1)
  • increment_events_ingested_size_by_date (634-659)
src/handlers/http/cluster/mod.rs (2)
src/alerts/alerts_utils.rs (2)
  • value (472-472)
  • value (474-474)
src/alerts/mod.rs (5)
  • value (414-414)
  • value (422-422)
  • value (430-430)
  • value (443-443)
  • value (454-454)
src/otel/traces.rs (1)
src/metrics/mod.rs (1)
  • increment_traces_collected_by_date (748-752)
src/handlers/http/ingest.rs (1)
src/handlers/http/modal/utils/ingest_utils.rs (1)
  • flatten_and_push_logs (51-137)
src/metrics/mod.rs (1)
src/handlers/http/mod.rs (1)
  • metrics_path (64-66)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: coverage
🔇 Additional comments (21)
src/event/format/mod.rs (1)

34-34: LGTM! Trait signature correctly extended with TelemetryType parameter.

The addition of telemetry_type: TelemetryType to the into_event trait method enables telemetry-aware metrics tracking throughout the event pipeline.

Also applies to: 224-225

src/storage/field_stats.rs (1)

134-134: LGTM! Internal stats correctly tagged with TelemetryType::Logs.

The field statistics stream is appropriately classified as TelemetryType::Logs since it represents internal system logging about dataset statistics.

Also applies to: 164-164

src/otel/traces.rs (1)

78-80: Metrics incremented during flattening - verify this is the intended behavior.

The trace count metric is incremented during the flattening phase using Utc::now() for the date. This counts spans as they're being processed, regardless of whether downstream ingestion succeeds. This is consistent with the pattern used in src/otel/logs.rs and src/otel/metrics.rs.

Confirm this aligns with your billing requirements - should traces be counted at processing time or only after successful ingestion?

src/otel/logs.rs (1)

150-152: LGTM! Log count metric implementation is consistent with the traces pattern.

The metric increment correctly counts log records per scope and uses Utc::now() for date labeling, which is consistent with the UTC-normalized approach mentioned in the learnings.

src/event/mod.rs (1)

56-56: LGTM! Event struct and size metric dispatch correctly updated.

The telemetry_type field is properly threaded through the Event struct, and increment_events_ingested_size_by_date now routes size metrics to the appropriate per-telemetry-type counter. Based on the relevant snippet from src/metrics/mod.rs, this correctly dispatches to TOTAL_LOGS_COLLECTED_SIZE_BY_DATE, TOTAL_METRICS_COLLECTED_SIZE_BY_DATE, or TOTAL_TRACES_COLLECTED_SIZE_BY_DATE depending on the type.

Also applies to: 97-97

src/otel/metrics.rs (1)

545-547: LGTM! Metrics count implementation follows the established pattern.

The metric increment is correctly placed within the scope metrics processing loop, counting metrics.len() for each scope. This is consistent with the logs and traces implementations.

src/handlers/http/cluster/mod.rs (4)

103-108: LGTM! BillingMetricsCollector struct properly extended with new metric fields.

The new fields for size and count metrics (logs, metrics, traces) are correctly added with HashMap<String, u64> type, matching the existing pattern for date-based metrics. The Default derive handles initialization.


210-244: LGTM! Billing event emission correctly extended for new metrics.

The add_simple_metric calls properly emit events for all six new metric types following the established pattern.


1316-1320: LGTM! Metric name classification properly extended.

All six new parseable metric names are correctly added to is_simple_metric for proper routing during sample processing.


1392-1416: LGTM! Sample processing correctly extended for new metrics.

The match arms properly populate the corresponding collector fields for each new metric type, following the existing pattern.

src/event/format/json.rs (1)

34-36: LGTM! EventFormat trait implementation correctly updated.

The into_event method properly accepts and passes through the telemetry_type parameter to the constructed Event struct, implementing the updated trait signature.

Also applies to: 152-152, 185-185

src/handlers/http/modal/utils/ingest_utils.rs (2)

51-137: LGTM! TelemetryType threading is consistent across all log source paths.

The telemetry_type parameter is correctly propagated through all branches (Kinesis, OtelLogs, OtelTraces, OtelMetrics, and default) to push_logs, ensuring consistent telemetry tracking regardless of the log source.


139-184: LGTM!

The push_logs function correctly accepts and forwards telemetry_type to the event construction, enabling downstream metrics tracking.

src/handlers/http/ingest.rs (5)

76-80: Verify intended behavior: ingest() accepts telemetry type from header but post_event() defaults to Logs.

In ingest(), telemetry_type is parsed from the TELEMETRY_TYPE_KEY header, allowing clients to specify the type. However, post_event() (lines 417-425) hardcodes TelemetryType::Logs. This inconsistency could lead to incorrect metrics if clients expect to use the header-based approach for both endpoints.

Is this intentional? If post_event should also support telemetry type from headers for non-OTEL ingestion paths, consider extracting it similarly.


291-353: LGTM! OTEL handlers correctly map to their respective TelemetryTypes.

Each OTEL handler consistently passes the appropriate TelemetryType to both setup_otel_stream and process_otel_content:

  • Logs → TelemetryType::Logs
  • Metrics → TelemetryType::Metrics
  • Traces → TelemetryType::Traces

242-286: LGTM!

The process_otel_content function correctly accepts and propagates telemetry_type to the downstream flatten_and_push_logs call.


141-165: LGTM!

Internal streams correctly use TelemetryType::Logs since they represent Parseable's own internal logging data.


430-449: LGTM!

The push_logs_unchecked function correctly initializes the event with TelemetryType::Logs as the default for unchecked batch ingestion.

src/metrics/mod.rs (3)

634-658: LGTM! Telemetry-aware size metric routing is correctly implemented.

The function properly routes size metrics to the appropriate counter based on TelemetryType. The pattern matching is exhaustive, and treating TelemetryType::Events the same as TelemetryType::Logs is semantically appropriate.


553-567: LGTM!

All five new metrics are properly registered with the Prometheus registry.


736-752: All three new count increment functions are already properly invoked from the OTEL processing paths. increment_logs_collected_by_date is called in src/otel/logs.rs:151, increment_metrics_collected_by_date is called in src/otel/metrics.rs:546, and increment_traces_collected_by_date is called in src/otel/traces.rs:79.

Likely an incorrect or invalid review comment.

coderabbitai[bot]
coderabbitai bot previously approved these changes Dec 30, 2025
add below metrics -
1. total logs collected by date
2. total size of logs collected by date
3. total metrics collected by date
4. total size of metrics collected by date
5. total traces collected by date
6. total size of traces collected by date

add all of these to pbilling
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/connectors/kafka/processor.rs (1)

54-62: Consider using TelemetryType::Logs explicitly for stream creation.

There's an inconsistency: create_stream_if_not_exists uses TelemetryType::default() (line 60) while into_event uses TelemetryType::Logs explicitly (line 94). For consistency and clarity, consider using TelemetryType::Logs in both places since Kafka ingestion is processing log data.

🔎 Suggested change
         PARSEABLE
             .create_stream_if_not_exists(
                 stream_name,
                 StreamType::UserDefined,
                 None,
                 vec![log_source_entry],
-                TelemetryType::default(),
+                TelemetryType::Logs,
             )
             .await?;
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b86629a and 18e455e.

📒 Files selected for processing (12)
  • src/connectors/kafka/processor.rs
  • src/event/format/json.rs
  • src/event/format/mod.rs
  • src/event/mod.rs
  • src/handlers/http/cluster/mod.rs
  • src/handlers/http/ingest.rs
  • src/handlers/http/modal/utils/ingest_utils.rs
  • src/metrics/mod.rs
  • src/otel/logs.rs
  • src/otel/metrics.rs
  • src/otel/traces.rs
  • src/storage/field_stats.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/otel/logs.rs
  • src/otel/traces.rs
  • src/event/format/mod.rs
🧰 Additional context used
🧠 Learnings (17)
📓 Common learnings
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1441
File: src/storage/s3.rs:837-844
Timestamp: 2025-09-25T07:13:58.909Z
Learning: In the Parseable codebase, the existing pattern of placing metrics calls after the `?` operator in S3 list_hours method is correct and should not be changed. The metrics are properly recorded only on successful operations.
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:700-756
Timestamp: 2025-09-18T09:59:20.177Z
Learning: In src/event/mod.rs, the parsed_timestamp used in increment_events_ingested_by_date() is correctly UTC-normalized: for dynamic streams it remains Utc::now(), and for streams with time partition enabled it uses the time partition value. Both cases result in proper UTC date strings for metrics labeling, preventing double-counting issues.
📚 Learning: 2025-08-18T14:48:46.324Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/handlers/http/ingest.rs:256-258
Timestamp: 2025-08-18T14:48:46.324Z
Learning: OTEL streams (OtelLogs, OtelMetrics, OtelTraces) in Parseable are intentionally designed to not support time partition functionality. The time_partition parameter should always be None for OTEL ingestion paths in flatten_and_push_logs calls.

Applied to files:

  • src/event/format/json.rs
📚 Learning: 2025-08-18T19:10:11.941Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/handlers/http/ingest.rs:163-164
Timestamp: 2025-08-18T19:10:11.941Z
Learning: Field statistics calculation in src/storage/field_stats.rs uses None for the time_partition parameter when calling flatten_and_push_logs(), as field stats generation does not require time partition functionality.

Applied to files:

  • src/storage/field_stats.rs
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.

Applied to files:

  • src/storage/field_stats.rs
  • src/otel/metrics.rs
  • src/event/mod.rs
  • src/handlers/http/modal/utils/ingest_utils.rs
  • src/metrics/mod.rs
  • src/handlers/http/cluster/mod.rs
📚 Learning: 2025-08-25T01:32:25.980Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.

Applied to files:

  • src/storage/field_stats.rs
  • src/otel/metrics.rs
  • src/event/mod.rs
  • src/metrics/mod.rs
  • src/handlers/http/cluster/mod.rs
📚 Learning: 2025-09-18T09:59:20.177Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:700-756
Timestamp: 2025-09-18T09:59:20.177Z
Learning: In src/event/mod.rs, the parsed_timestamp used in increment_events_ingested_by_date() is correctly UTC-normalized: for dynamic streams it remains Utc::now(), and for streams with time partition enabled it uses the time partition value. Both cases result in proper UTC date strings for metrics labeling, preventing double-counting issues.

Applied to files:

  • src/otel/metrics.rs
  • src/event/mod.rs
  • src/connectors/kafka/processor.rs
  • src/metrics/mod.rs
  • src/handlers/http/cluster/mod.rs
  • src/handlers/http/ingest.rs
📚 Learning: 2025-10-20T17:48:53.444Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1448
File: src/handlers/http/cluster/mod.rs:1370-1400
Timestamp: 2025-10-20T17:48:53.444Z
Learning: In src/handlers/http/cluster/mod.rs, the billing metrics processing logic should NOT accumulate counter values from multiple Prometheus samples with the same labels. The intended behavior is to convert each received counter from nodes into individual events for ingestion, using `.insert()` to store the counter value directly.

Applied to files:

  • src/otel/metrics.rs
  • src/event/mod.rs
  • src/metrics/mod.rs
  • src/handlers/http/cluster/mod.rs
📚 Learning: 2025-09-18T09:52:07.554Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/storage/object_storage.rs:173-177
Timestamp: 2025-09-18T09:52:07.554Z
Learning: In Parseable's upload system (src/storage/object_storage.rs), the update_storage_metrics function can safely use path.metadata().map_err() to fail on local file metadata read failures because parquet validation (validate_uploaded_parquet_file) ensures file integrity before this step, and the system guarantees local staging files remain accessible throughout the upload flow.

Applied to files:

  • src/event/mod.rs
📚 Learning: 2025-08-18T17:59:31.642Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/handlers/http/modal/utils/ingest_utils.rs:149-156
Timestamp: 2025-08-18T17:59:31.642Z
Learning: The time_partition parameter in push_logs() function in src/handlers/http/modal/utils/ingest_utils.rs is determined by the caller (flatten_and_push_logs). OSS callers pass None, enterprise callers pass the appropriate value (None or Some<>), and OTEL callers always pass None. The push_logs() function should not add additional time partition logic since it's already handled at the caller level.

Applied to files:

  • src/handlers/http/modal/utils/ingest_utils.rs
  • src/handlers/http/ingest.rs
📚 Learning: 2025-03-26T06:44:53.362Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.

Applied to files:

  • src/handlers/http/modal/utils/ingest_utils.rs
📚 Learning: 2025-08-18T12:37:47.732Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/parseable/mod.rs:528-533
Timestamp: 2025-08-18T12:37:47.732Z
Learning: In Parseable, the validate_time_partition function in src/utils/json/flatten.rs already provides a default time partition limit of 30 days using `map_or(30, |days| days.get() as i64)` when time_partition_limit is None, so no additional defaulting is needed in the stream creation logic in src/parseable/mod.rs.

Applied to files:

  • src/handlers/http/modal/utils/ingest_utils.rs
📚 Learning: 2025-09-25T07:12:33.401Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1441
File: src/storage/azure_blob.rs:727-735
Timestamp: 2025-09-25T07:12:33.401Z
Learning: In Parseable's object storage implementations, metrics should only be captured when operations succeed, not when they are attempted. The increment_object_store_calls_by_date() calls should be placed after the await? to ensure they only execute on successful operations.

Applied to files:

  • src/handlers/http/cluster/mod.rs
📚 Learning: 2025-09-25T07:12:40.189Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1441
File: src/storage/gcs.rs:625-633
Timestamp: 2025-09-25T07:12:40.189Z
Learning: In Parseable's object storage metrics system, metrics should only be captured when API calls succeed, not when they error out. The current pattern of calling increment_object_store_calls_by_date and related metrics functions after the `?` operator is the correct approach.

Applied to files:

  • src/handlers/http/cluster/mod.rs
📚 Learning: 2025-09-25T07:12:27.407Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1441
File: src/storage/azure_blob.rs:693-700
Timestamp: 2025-09-25T07:12:27.407Z
Learning: In the Parseable codebase, object store metrics (increment_object_store_calls_by_date, increment_files_scanned_in_object_store_calls_by_date, etc.) should only be recorded for successful operations, not for failed attempts. The metric calls should be placed after error handling operators like `?` to ensure they only execute when operations succeed.

Applied to files:

  • src/handlers/http/cluster/mod.rs
📚 Learning: 2025-08-18T12:33:16.085Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/storage/object_storage.rs:145-166
Timestamp: 2025-08-18T12:33:16.085Z
Learning: In Parseable's staging and upload process, parquet file names are guaranteed to always contain the date part in the expected format (date=YYYY-MM-DD). The system ensures no deviation from this naming convention, making defensive parsing unnecessary for date extraction from filenames during storage metrics updates.

Applied to files:

  • src/handlers/http/cluster/mod.rs
📚 Learning: 2025-08-20T17:01:25.791Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1409
File: src/storage/field_stats.rs:429-456
Timestamp: 2025-08-20T17:01:25.791Z
Learning: In Parseable's field stats calculation (src/storage/field_stats.rs), the extract_datetime_from_parquet_path_regex function correctly works with filename-only parsing because Parseable's server-side filename generation guarantees the dot-separated format date=YYYY-MM-DD.hour=HH.minute=MM pattern in parquet filenames.

Applied to files:

  • src/handlers/http/cluster/mod.rs
📚 Learning: 2025-08-18T18:01:22.834Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/handlers/http/modal/utils/ingest_utils.rs:271-292
Timestamp: 2025-08-18T18:01:22.834Z
Learning: In Parseable's ingestion validation, validate_stream_for_ingestion is designed to prevent regular log ingestion endpoints (ingest() and post_event()) from ingesting into streams that exclusively contain OTEL traces or metrics. The function allows mixed streams (regular logs + OTEL) but blocks ingestion into OTEL-only streams, maintaining architectural separation between regular log and OTEL ingestion pathways.

Applied to files:

  • src/handlers/http/ingest.rs
🧬 Code graph analysis (4)
src/otel/metrics.rs (1)
src/metrics/mod.rs (1)
  • increment_metrics_collected_by_date (736-740)
src/event/mod.rs (1)
src/metrics/mod.rs (1)
  • increment_events_ingested_size_by_date (634-659)
src/metrics/mod.rs (1)
src/handlers/http/mod.rs (1)
  • metrics_path (64-66)
src/handlers/http/ingest.rs (1)
src/handlers/http/modal/utils/ingest_utils.rs (1)
  • flatten_and_push_logs (51-137)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: coverage
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
🔇 Additional comments (29)
src/storage/field_stats.rs (1)

164-164: LGTM!

The addition of TelemetryType::Logs for internal dataset statistics events is consistent with the broader telemetry propagation pattern. This correctly categorizes the internal stats events as log-type telemetry.

src/connectors/kafka/processor.rs (1)

94-94: LGTM!

The explicit TelemetryType::Logs for Kafka event processing is correct and aligns with the telemetry type propagation pattern.

src/event/mod.rs (2)

56-56: LGTM!

The telemetry_type field addition to the Event struct is well-designed. It correctly propagates telemetry context through the event processing pipeline.


97-97: LGTM!

The telemetry type is correctly passed to increment_events_ingested_size_by_date, enabling per-type (Logs/Metrics/Traces) size tracking. Based on retrieved learnings, the parsed_timestamp used here is correctly UTC-normalized.

src/handlers/http/cluster/mod.rs (4)

103-108: LGTM!

The new per-date billing metrics fields are well-structured and follow the established pattern for telemetry-type aware metrics collection. The Default derive ensures proper initialization.


210-244: LGTM!

The emission logic for the new metrics (logs/traces/metrics collected and their sizes) correctly follows the established pattern. Each metric is only emitted when its map is non-empty, which is consistent with the existing billing metrics handling.


1316-1320: LGTM!

The is_simple_metric function correctly extended to recognize the new per-date telemetry metrics.


1392-1416: LGTM!

The process_simple_metric function correctly processes the new per-date telemetry metrics using .insert() to store counter values directly. Based on retrieved learnings, this is the intended behavior for billing metrics processing.

src/otel/metrics.rs (1)

545-546: LGTM!

The batch-level metrics counting is correctly implemented. The date is computed once per scope metric iteration, and increment_metrics_collected_by_date is called with the count of metrics in that scope. This approach is more efficient than per-metric incrementing.

src/event/format/json.rs (2)

152-152: LGTM!

The into_event signature correctly extended to accept TelemetryType, enabling telemetry context propagation through the event construction pipeline.


185-185: LGTM!

The telemetry_type field is correctly set in the Event construction, completing the propagation chain.

src/handlers/http/modal/utils/ingest_utils.rs (3)

51-57: LGTM!

The flatten_and_push_logs function signature correctly extended with telemetry_type parameter, enabling telemetry type propagation through all ingestion paths.


139-145: LGTM!

The push_logs function signature correctly extended to accept and forward telemetry_type to the event construction.


179-179: LGTM!

The telemetry_type is correctly passed to into_event, completing the propagation chain from ingestion entry points to event construction.

src/handlers/http/ingest.rs (9)

76-80: LGTM!

The telemetry type is correctly extracted from the request header with a sensible default fallback. This allows clients to specify the telemetry type via the TELEMETRY_TYPE_KEY header.


128-136: LGTM!

The flatten_and_push_logs call correctly passes through the user-specified telemetry_type from the request header.


160-160: LGTM!

Internal stream events are correctly tagged with TelemetryType::Logs.


247-247: LGTM!

The process_otel_content function signature correctly extended to accept telemetry_type, enabling per-OTEL-type metrics collection.


303-303: LGTM!

OTEL logs ingestion correctly uses TelemetryType::Logs.


323-330: LGTM!

OTEL metrics ingestion correctly uses TelemetryType::Metrics.


350-350: LGTM!

OTEL traces ingestion correctly uses TelemetryType::Traces.


417-425: LGTM!

The post_event endpoint correctly defaults to TelemetryType::Logs for log ingestion via the logstream endpoint.


444-444: LGTM!

The push_logs_unchecked function correctly sets TelemetryType::Logs for unchecked event processing.

src/metrics/mod.rs (6)

20-23: LGTM!

The imports are correctly added to support the new telemetry-type-aware metrics functionality.


391-449: LGTM!

The new metrics are well-structured with clear naming, appropriate counter types for billing purposes, and consistent label structure across all telemetry types.


553-567: LGTM!

All new metrics are properly registered following the existing pattern.


634-659: Verify TelemetryType::Events handling.

In the match statement, TelemetryType::Events is treated identically to TelemetryType::Logs, both incrementing TOTAL_LOGS_COLLECTED_SIZE_BY_DATE. Please confirm this is intentional. If Events and Logs should be tracked separately for billing purposes, they need distinct metrics.

Additionally, verify that using "all" as the team label value for global aggregation aligns with your billing requirements.


736-752: LGTM!

The helper functions are well-structured with consistent signatures and correctly use the ["all", date] label pattern for global aggregation metrics.


379-389: The metric label change from ["date"] to ["team", "date"] is confirmed, but verify external consumer impact.

The TOTAL_METRICS_COLLECTED_BY_DATE metric was introduced in the previous commit with labels ["date"] and modified in this commit to ["team", "date"]. This label structure change will break any external Prometheus queries, dashboards, or alerts using the old format.

However, since this metric was added only 1-2 days ago and no Prometheus/Grafana configurations are present in the repo, the immediate impact appears limited. All internal code usages have been consistently updated to provide both ["all", date] values. Before merging, confirm that no external monitoring systems are querying this metric with the old label schema.

@nitisht nitisht merged commit 95f9e21 into parseablehq:main Dec 30, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants