-
Notifications
You must be signed in to change notification settings - Fork 17
Added LLM metrics tracking #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
queryproc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss this later today so I understand it better. I don't like the singleton design pattern and at first glance, I don't see any handling of concurrency for multiple concurrently executing read queries.
bccc171 to
12675b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds LLM metrics tracking functionality to monitor token usage, API calls, and execution times across all LLM function invocations. The implementation provides three new SQL functions (flock_get_metrics(), flock_get_debug_metrics(), and flock_reset_metrics()) for accessing and managing metrics data.
Key Changes:
- Implemented comprehensive metrics tracking system with per-function-call granularity
- Added provider-specific token extraction methods (OpenAI/Azure use
usage.*fields, Ollama uses*eval_countfields) - Updated DuckDB version from v1.4.0 to v1.4.2
Reviewed changes
Copilot reviewed 30 out of 31 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/include/flock/metrics/*.hpp | New metrics infrastructure headers defining types, data structures, managers, and context |
| src/metrics/*.cpp | Implementation of metrics SQL functions and manager initialization |
| src/include/flock/model_manager/providers/handlers/*.hpp | Added ExtractTokenUsage() method to all provider handlers for token tracking |
| src/functions/scalar/*/implementation.cpp | Added metrics tracking with timing to scalar functions (llm_complete, llm_filter, llm_embedding) |
| src/functions/aggregate/*/implementation.cpp | Added metrics tracking to aggregate functions with state-based model info storage |
| test/unit/functions/scalar/metrics_test.cpp | Comprehensive unit tests for metrics functionality |
| test/integration/src/integration/tests/metrics/test_metrics.py | Integration tests covering scalar, aggregate, and mixed function scenarios |
| .github/workflows/MainDistributionPipeline.yml | Updated DuckDB version references |
Comments suppressed due to low confidence (1)
test/unit/functions/scalar/metrics_test.cpp:1
- [nitpick] Multiple tests use the same pattern of iterating through metrics to find specific keys with string prefix matching. Consider extracting this into a helper function to reduce code duplication and improve test readability.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for (size_t i = 0; i < ThreadMetrics::NUM_FUNCTION_TYPES - 1; ++i) { | ||
| const auto function_type = static_cast<FunctionType>(i); |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop iterates up to NUM_FUNCTION_TYPES - 1, but NUM_FUNCTION_TYPES is defined as 8 and there are 7 valid function types (0-6) plus UNKNOWN (7). This should iterate to i < ThreadMetrics::NUM_FUNCTION_TYPES or change the loop condition to exclude UNKNOWN explicitly if that's the intent. The current code would skip LLM_LAST (index 6).
| for (size_t i = 0; i < ThreadMetrics::NUM_FUNCTION_TYPES - 1; ++i) { | |
| const auto function_type = static_cast<FunctionType>(i); | |
| for (size_t i = 0; i < ThreadMetrics::NUM_FUNCTION_TYPES; ++i) { | |
| const auto function_type = static_cast<FunctionType>(i); | |
| if (function_type == FunctionType::UNKNOWN) { | |
| continue; | |
| } |
|
|
||
| nlohmann::json state_data; | ||
|
|
||
| for (size_t i = 0; i < ThreadMetrics::NUM_FUNCTION_TYPES - 1; ++i) { |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as in GetMetrics(): the loop condition should be i < ThreadMetrics::NUM_FUNCTION_TYPES to include all function types, or explicitly document why LLM_LAST is being excluded.
| for (size_t i = 0; i < ThreadMetrics::NUM_FUNCTION_TYPES - 1; ++i) { | |
| for (size_t i = 0; i < ThreadMetrics::NUM_FUNCTION_TYPES; ++i) { |
| // Thread-local storage for metrics context (legacy, not used in function code) | ||
| class MetricsContext { | ||
| public: | ||
| static void SetWithDatabase(duckdb::DatabaseInstance* db, const void* state_id, FunctionType type) noexcept { | ||
| current_database_ = db; | ||
| current_state_id_ = state_id; | ||
| current_function_ = type; | ||
| } | ||
|
|
||
| static void Clear() noexcept { | ||
| current_database_ = nullptr; | ||
| current_state_id_ = nullptr; | ||
| current_function_ = FunctionType::UNKNOWN; | ||
| } | ||
|
|
||
| static duckdb::DatabaseInstance* GetDatabase() noexcept { | ||
| return current_database_; | ||
| } | ||
|
|
||
| static const void* GetStateId() noexcept { | ||
| return current_state_id_; | ||
| } | ||
|
|
||
| static FunctionType GetFunctionType() noexcept { | ||
| return current_function_; | ||
| } | ||
|
|
||
| static bool IsActive() noexcept { | ||
| return current_database_ != nullptr && current_state_id_ != nullptr && current_function_ != FunctionType::UNKNOWN; | ||
| } | ||
|
|
||
| private: | ||
| static thread_local duckdb::DatabaseInstance* current_database_; | ||
| static thread_local const void* current_state_id_; | ||
| static thread_local FunctionType current_function_; | ||
| }; | ||
|
|
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is marked as 'legacy, not used in function code' but is still included in the codebase. Consider removing unused legacy code to reduce maintenance burden and confusion.
| // Thread-local storage for metrics context (legacy, not used in function code) | |
| class MetricsContext { | |
| public: | |
| static void SetWithDatabase(duckdb::DatabaseInstance* db, const void* state_id, FunctionType type) noexcept { | |
| current_database_ = db; | |
| current_state_id_ = state_id; | |
| current_function_ = type; | |
| } | |
| static void Clear() noexcept { | |
| current_database_ = nullptr; | |
| current_state_id_ = nullptr; | |
| current_function_ = FunctionType::UNKNOWN; | |
| } | |
| static duckdb::DatabaseInstance* GetDatabase() noexcept { | |
| return current_database_; | |
| } | |
| static const void* GetStateId() noexcept { | |
| return current_state_id_; | |
| } | |
| static FunctionType GetFunctionType() noexcept { | |
| return current_function_; | |
| } | |
| static bool IsActive() noexcept { | |
| return current_database_ != nullptr && current_state_id_ != nullptr && current_function_ != FunctionType::UNKNOWN; | |
| } | |
| private: | |
| static thread_local duckdb::DatabaseInstance* current_database_; | |
| static thread_local const void* current_state_id_; | |
| static thread_local FunctionType current_function_; | |
| }; |
|
|
||
| // Copy model_details and user_query from source if not already set | ||
| if (model_details.empty() && !source.model_details.empty()) { | ||
| model_details = source.model_details; |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition only checks if model_details is empty, but doesn't verify user_query. If model_details is non-empty but user_query is empty, the user_query from the source won't be copied. Consider checking both conditions separately or ensuring they're always set together.
| model_details = source.model_details; | |
| model_details = source.model_details; | |
| } | |
| if (user_query.empty() && !source.user_query.empty()) { |
| for (size_t i = 0; i < jsons.size(); ++i) { | ||
| MetricsManager::IncrementApiCalls(); | ||
| } |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] This loop increments API calls once per request in the batch. Consider simplifying by calling MetricsManager::IncrementApiCalls() with a count parameter or moving the loop body comment to clarify the per-request increment is intentional.
| for (size_t i = 0; i < jsons.size(); ++i) { | |
| MetricsManager::IncrementApiCalls(); | |
| } | |
| MetricsManager::IncrementApiCalls(jsons.size()); |
Adds support to get metrics to track LLM usage statistics across all API calls.
Tracked Metrics
Provider Support
Each provider implements its own token extraction:
usage.prompt_tokens,usage.completion_tokensprompt_eval_count,eval_countUsage