Skip to content

Conversation

@anasdorbani
Copy link
Member

@anasdorbani anasdorbani commented Dec 2, 2025

Adds support to get metrics to track LLM usage statistics across all API calls.

Tracked Metrics

  • Token usage (input, output, total)
  • API call count
  • API duration & execution time (total and average)

Provider Support

Each provider implements its own token extraction:

  • OpenAI/Azure: usage.prompt_tokens, usage.completion_tokens
  • Ollama: prompt_eval_count, eval_count

Usage

SELECT flock_get_metrics();  -- Returns JSON with all metrics
SELECT flock_get_debug_metrics(); -- Returns JSON with more detailed metrics for debugging
SELECT flock_reset_metrics(); -- Resets counters

@anasdorbani anasdorbani changed the title Add global LLM metrics tracking Add LLM metrics tracking Dec 2, 2025
@anasdorbani anasdorbani changed the title Add LLM metrics tracking Added LLM metrics tracking Dec 2, 2025
Copy link
Member

@queryproc queryproc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this later today so I understand it better. I don't like the singleton design pattern and at first glance, I don't see any handling of concurrency for multiple concurrently executing read queries.

Copilot AI review requested due to automatic review settings December 6, 2025 23:05
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds LLM metrics tracking functionality to monitor token usage, API calls, and execution times across all LLM function invocations. The implementation provides three new SQL functions (flock_get_metrics(), flock_get_debug_metrics(), and flock_reset_metrics()) for accessing and managing metrics data.

Key Changes:

  • Implemented comprehensive metrics tracking system with per-function-call granularity
  • Added provider-specific token extraction methods (OpenAI/Azure use usage.* fields, Ollama uses *eval_count fields)
  • Updated DuckDB version from v1.4.0 to v1.4.2

Reviewed changes

Copilot reviewed 30 out of 31 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/include/flock/metrics/*.hpp New metrics infrastructure headers defining types, data structures, managers, and context
src/metrics/*.cpp Implementation of metrics SQL functions and manager initialization
src/include/flock/model_manager/providers/handlers/*.hpp Added ExtractTokenUsage() method to all provider handlers for token tracking
src/functions/scalar/*/implementation.cpp Added metrics tracking with timing to scalar functions (llm_complete, llm_filter, llm_embedding)
src/functions/aggregate/*/implementation.cpp Added metrics tracking to aggregate functions with state-based model info storage
test/unit/functions/scalar/metrics_test.cpp Comprehensive unit tests for metrics functionality
test/integration/src/integration/tests/metrics/test_metrics.py Integration tests covering scalar, aggregate, and mixed function scenarios
.github/workflows/MainDistributionPipeline.yml Updated DuckDB version references
Comments suppressed due to low confidence (1)

test/unit/functions/scalar/metrics_test.cpp:1

  • [nitpick] Multiple tests use the same pattern of iterating through metrics to find specific keys with string prefix matching. Consider extracting this into a helper function to reduce code duplication and improve test readability.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +118 to +119
for (size_t i = 0; i < ThreadMetrics::NUM_FUNCTION_TYPES - 1; ++i) {
const auto function_type = static_cast<FunctionType>(i);
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop iterates up to NUM_FUNCTION_TYPES - 1, but NUM_FUNCTION_TYPES is defined as 8 and there are 7 valid function types (0-6) plus UNKNOWN (7). This should iterate to i < ThreadMetrics::NUM_FUNCTION_TYPES or change the loop condition to exclude UNKNOWN explicitly if that's the intent. The current code would skip LLM_LAST (index 6).

Suggested change
for (size_t i = 0; i < ThreadMetrics::NUM_FUNCTION_TYPES - 1; ++i) {
const auto function_type = static_cast<FunctionType>(i);
for (size_t i = 0; i < ThreadMetrics::NUM_FUNCTION_TYPES; ++i) {
const auto function_type = static_cast<FunctionType>(i);
if (function_type == FunctionType::UNKNOWN) {
continue;
}

Copilot uses AI. Check for mistakes.

nlohmann::json state_data;

for (size_t i = 0; i < ThreadMetrics::NUM_FUNCTION_TYPES - 1; ++i) {
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as in GetMetrics(): the loop condition should be i < ThreadMetrics::NUM_FUNCTION_TYPES to include all function types, or explicitly document why LLM_LAST is being excluded.

Suggested change
for (size_t i = 0; i < ThreadMetrics::NUM_FUNCTION_TYPES - 1; ++i) {
for (size_t i = 0; i < ThreadMetrics::NUM_FUNCTION_TYPES; ++i) {

Copilot uses AI. Check for mistakes.
Comment on lines 8 to 44
// Thread-local storage for metrics context (legacy, not used in function code)
class MetricsContext {
public:
static void SetWithDatabase(duckdb::DatabaseInstance* db, const void* state_id, FunctionType type) noexcept {
current_database_ = db;
current_state_id_ = state_id;
current_function_ = type;
}

static void Clear() noexcept {
current_database_ = nullptr;
current_state_id_ = nullptr;
current_function_ = FunctionType::UNKNOWN;
}

static duckdb::DatabaseInstance* GetDatabase() noexcept {
return current_database_;
}

static const void* GetStateId() noexcept {
return current_state_id_;
}

static FunctionType GetFunctionType() noexcept {
return current_function_;
}

static bool IsActive() noexcept {
return current_database_ != nullptr && current_state_id_ != nullptr && current_function_ != FunctionType::UNKNOWN;
}

private:
static thread_local duckdb::DatabaseInstance* current_database_;
static thread_local const void* current_state_id_;
static thread_local FunctionType current_function_;
};

Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is marked as 'legacy, not used in function code' but is still included in the codebase. Consider removing unused legacy code to reduce maintenance burden and confusion.

Suggested change
// Thread-local storage for metrics context (legacy, not used in function code)
class MetricsContext {
public:
static void SetWithDatabase(duckdb::DatabaseInstance* db, const void* state_id, FunctionType type) noexcept {
current_database_ = db;
current_state_id_ = state_id;
current_function_ = type;
}
static void Clear() noexcept {
current_database_ = nullptr;
current_state_id_ = nullptr;
current_function_ = FunctionType::UNKNOWN;
}
static duckdb::DatabaseInstance* GetDatabase() noexcept {
return current_database_;
}
static const void* GetStateId() noexcept {
return current_state_id_;
}
static FunctionType GetFunctionType() noexcept {
return current_function_;
}
static bool IsActive() noexcept {
return current_database_ != nullptr && current_state_id_ != nullptr && current_function_ != FunctionType::UNKNOWN;
}
private:
static thread_local duckdb::DatabaseInstance* current_database_;
static thread_local const void* current_state_id_;
static thread_local FunctionType current_function_;
};

Copilot uses AI. Check for mistakes.

// Copy model_details and user_query from source if not already set
if (model_details.empty() && !source.model_details.empty()) {
model_details = source.model_details;
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition only checks if model_details is empty, but doesn't verify user_query. If model_details is non-empty but user_query is empty, the user_query from the source won't be copied. Consider checking both conditions separately or ensuring they're always set together.

Suggested change
model_details = source.model_details;
model_details = source.model_details;
}
if (user_query.empty() && !source.user_query.empty()) {

Copilot uses AI. Check for mistakes.
Comment on lines +113 to +115
for (size_t i = 0; i < jsons.size(); ++i) {
MetricsManager::IncrementApiCalls();
}
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] This loop increments API calls once per request in the batch. Consider simplifying by calling MetricsManager::IncrementApiCalls() with a count parameter or moving the loop body comment to clarify the per-request increment is intentional.

Suggested change
for (size_t i = 0; i < jsons.size(); ++i) {
MetricsManager::IncrementApiCalls();
}
MetricsManager::IncrementApiCalls(jsons.size());

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants