Added LLM metrics tracking #217

anasdorbani · 2025-12-02T17:02:04Z

Adds support to get metrics to track LLM usage statistics across all API calls.

Tracked Metrics

Token usage (input, output, total)
API call count
API duration & execution time (total and average)

Provider Support

Each provider implements its own token extraction:

OpenAI/Azure: usage.prompt_tokens, usage.completion_tokens
Ollama: prompt_eval_count, eval_count

Usage

SELECT flock_get_metrics();  -- Returns JSON with all metrics
SELECT flock_get_debug_metrics(); -- Returns JSON with more detailed metrics for debugging
SELECT flock_reset_metrics(); -- Resets counters

queryproc

Let's discuss this later today so I understand it better. I don't like the singleton design pattern and at first glance, I don't see any handling of concurrency for multiple concurrently executing read queries.

…dictable tests

Copilot

Pull request overview

This PR adds LLM metrics tracking functionality to monitor token usage, API calls, and execution times across all LLM function invocations. The implementation provides three new SQL functions (flock_get_metrics(), flock_get_debug_metrics(), and flock_reset_metrics()) for accessing and managing metrics data.

Key Changes:

Implemented comprehensive metrics tracking system with per-function-call granularity
Added provider-specific token extraction methods (OpenAI/Azure use usage.* fields, Ollama uses *eval_count fields)
Updated DuckDB version from v1.4.0 to v1.4.2

Reviewed changes

Copilot reviewed 30 out of 31 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/include/flock/metrics/*.hpp	New metrics infrastructure headers defining types, data structures, managers, and context
src/metrics/*.cpp	Implementation of metrics SQL functions and manager initialization
src/include/flock/model_manager/providers/handlers/*.hpp	Added `ExtractTokenUsage()` method to all provider handlers for token tracking
src/functions/scalar/*/implementation.cpp	Added metrics tracking with timing to scalar functions (llm_complete, llm_filter, llm_embedding)
src/functions/aggregate/*/implementation.cpp	Added metrics tracking to aggregate functions with state-based model info storage
test/unit/functions/scalar/metrics_test.cpp	Comprehensive unit tests for metrics functionality
test/integration/src/integration/tests/metrics/test_metrics.py	Integration tests covering scalar, aggregate, and mixed function scenarios
.github/workflows/MainDistributionPipeline.yml	Updated DuckDB version references

Comments suppressed due to low confidence (1)

test/unit/functions/scalar/metrics_test.cpp:1

[nitpick] Multiple tests use the same pattern of iterating through metrics to find specific keys with string prefix matching. Consider extracting this into a helper function to reduce code duplication and improve test readability.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-06T23:07:42Z

src/include/flock/metrics/base_manager.hpp

+                for (size_t i = 0; i < ThreadMetrics::NUM_FUNCTION_TYPES - 1; ++i) {
+                    const auto function_type = static_cast<FunctionType>(i);


The loop iterates up to NUM_FUNCTION_TYPES - 1, but NUM_FUNCTION_TYPES is defined as 8 and there are 7 valid function types (0-6) plus UNKNOWN (7). This should iterate to i < ThreadMetrics::NUM_FUNCTION_TYPES or change the loop condition to exclude UNKNOWN explicitly if that's the intent. The current code would skip LLM_LAST (index 6).

Suggested change

for (size_t i = 0; i < ThreadMetrics::NUM_FUNCTION_TYPES - 1; ++i) {

const auto function_type = static_cast<FunctionType>(i);

for (size_t i = 0; i < ThreadMetrics::NUM_FUNCTION_TYPES; ++i) {

const auto function_type = static_cast<FunctionType>(i);

if (function_type == FunctionType::UNKNOWN) {

continue;

}

Copilot · 2025-12-06T23:07:42Z

src/include/flock/metrics/base_manager.hpp

+
+                nlohmann::json state_data;
+
+                for (size_t i = 0; i < ThreadMetrics::NUM_FUNCTION_TYPES - 1; ++i) {


Same issue as in GetMetrics(): the loop condition should be i < ThreadMetrics::NUM_FUNCTION_TYPES to include all function types, or explicitly document why LLM_LAST is being excluded.

Suggested change

for (size_t i = 0; i < ThreadMetrics::NUM_FUNCTION_TYPES - 1; ++i) {

for (size_t i = 0; i < ThreadMetrics::NUM_FUNCTION_TYPES; ++i) {

Copilot · 2025-12-06T23:07:42Z

src/include/flock/metrics/context.hpp

+// Thread-local storage for metrics context (legacy, not used in function code)
+class MetricsContext {
+public:
+    static void SetWithDatabase(duckdb::DatabaseInstance* db, const void* state_id, FunctionType type) noexcept {
+        current_database_ = db;
+        current_state_id_ = state_id;
+        current_function_ = type;
+    }
+
+    static void Clear() noexcept {
+        current_database_ = nullptr;
+        current_state_id_ = nullptr;
+        current_function_ = FunctionType::UNKNOWN;
+    }
+
+    static duckdb::DatabaseInstance* GetDatabase() noexcept {
+        return current_database_;
+    }
+
+    static const void* GetStateId() noexcept {
+        return current_state_id_;
+    }
+
+    static FunctionType GetFunctionType() noexcept {
+        return current_function_;
+    }
+
+    static bool IsActive() noexcept {
+        return current_database_ != nullptr && current_state_id_ != nullptr && current_function_ != FunctionType::UNKNOWN;
+    }
+
+private:
+    static thread_local duckdb::DatabaseInstance* current_database_;
+    static thread_local const void* current_state_id_;
+    static thread_local FunctionType current_function_;
+};
+


This class is marked as 'legacy, not used in function code' but is still included in the codebase. Consider removing unused legacy code to reduce maintenance burden and confusion.

Suggested change

// Thread-local storage for metrics context (legacy, not used in function code)

class MetricsContext {

public:

static void SetWithDatabase(duckdb::DatabaseInstance* db, const void* state_id, FunctionType type) noexcept {

current_database_ = db;

current_state_id_ = state_id;

current_function_ = type;

}

static void Clear() noexcept {

current_database_ = nullptr;

current_state_id_ = nullptr;

current_function_ = FunctionType::UNKNOWN;

}

static duckdb::DatabaseInstance* GetDatabase() noexcept {

return current_database_;

}

static const void* GetStateId() noexcept {

return current_state_id_;

}

static FunctionType GetFunctionType() noexcept {

return current_function_;

}

static bool IsActive() noexcept {

return current_database_ != nullptr && current_state_id_ != nullptr && current_function_ != FunctionType::UNKNOWN;

}

private:

static thread_local duckdb::DatabaseInstance* current_database_;

static thread_local const void* current_state_id_;

static thread_local FunctionType current_function_;

};

Copilot · 2025-12-06T23:07:42Z

src/functions/aggregate/aggregate_state.cpp


+    // Copy model_details and user_query from source if not already set
+    if (model_details.empty() && !source.model_details.empty()) {
+        model_details = source.model_details;


The condition only checks if model_details is empty, but doesn't verify user_query. If model_details is non-empty but user_query is empty, the user_query from the source won't be copied. Consider checking both conditions separately or ensuring they're always set together.

Suggested change

model_details = source.model_details;

model_details = source.model_details;

}

if (user_query.empty() && !source.user_query.empty()) {

Copilot · 2025-12-06T23:07:43Z

src/include/flock/model_manager/providers/handlers/base_handler.hpp

+        for (size_t i = 0; i < jsons.size(); ++i) {
+            MetricsManager::IncrementApiCalls();
+        }


[nitpick] This loop increments API calls once per request in the batch. Consider simplifying by calling MetricsManager::IncrementApiCalls() with a count parameter or moving the loop body comment to clarify the per-request increment is intentional.

Suggested change

for (size_t i = 0; i < jsons.size(); ++i) {

MetricsManager::IncrementApiCalls();

}

MetricsManager::IncrementApiCalls(jsons.size());

anasdorbani added 2 commits December 2, 2025 11:40

added flock metrics to all the providers

6bbb82c

upgrade gh action to DuckDB 1.4.2

fa553a1

anasdorbani changed the title ~~Add global LLM metrics tracking~~ Add LLM metrics tracking Dec 2, 2025

anasdorbani added 2 commits December 2, 2025 13:02

registered the metrics scalar functions

e0d4c1f

added unit tests for the metrics feature

c79dfcd

anasdorbani changed the title ~~Add LLM metrics tracking~~ Added LLM metrics tracking Dec 2, 2025

queryproc reviewed Dec 2, 2025

View reviewed changes

anasdorbani added 12 commits December 6, 2025 15:50

added integration tests for the metrics feature

5abddab

Removed old metrics wrapper

b90eac7

Updated metrics registry

2b80454

Updated handlers to use MetricsManager

2965d81

Merged scalar and aggregate metrics tests

19ea07b

Updated metrics CMakeLists

12c2605

Added merged metrics integration tests

7934bad

Fixed code formatting

4de6e21

Fixed include in llm_complete

03f58a4

Replaced old FlockMetrics API call

d86c4c2

Add missing metrics tracking to llm_complete function

5b7d511

Update test prompts to ensure 1-2 word responses for faster, more pre…

12675b2

…dictable tests

Copilot AI review requested due to automatic review settings December 6, 2025 23:05

anasdorbani force-pushed the feat/flock-metrics branch from bccc171 to 12675b2 Compare December 6, 2025 23:05

Copilot AI reviewed Dec 6, 2025

View reviewed changes

anasdorbani added 4 commits December 6, 2025 18:48

Remove legacy MetricsContext class (replaced by MetricsManager)

c66bb3a

Centralized shared standard library includes in common.hpp

7aab760

Add metrics merging for aggregate functions

d0325e8

Add tests for metrics merging

4ad6d1c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added LLM metrics tracking #217

Added LLM metrics tracking #217

Uh oh!

anasdorbani commented Dec 2, 2025 •

edited

Loading

Uh oh!

queryproc left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 6, 2025

Uh oh!

Copilot AI Dec 6, 2025

Uh oh!

Copilot AI Dec 6, 2025

Uh oh!

Copilot AI Dec 6, 2025

Uh oh!

Copilot AI Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		for (size_t i = 0; i < ThreadMetrics::NUM_FUNCTION_TYPES - 1; ++i) {
		const auto function_type = static_cast<FunctionType>(i);


		nlohmann::json state_data;

		for (size_t i = 0; i < ThreadMetrics::NUM_FUNCTION_TYPES - 1; ++i) {

Added LLM metrics tracking #217

Are you sure you want to change the base?

Added LLM metrics tracking #217

Uh oh!

Conversation

anasdorbani commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tracked Metrics

Provider Support

Usage

Uh oh!

queryproc left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anasdorbani commented Dec 2, 2025 •

edited

Loading