Skip to content

Commit 623838b

Browse files
committed
feat: Migrate to OpenAI Responses API with full reasoning model support
BREAKING CHANGE: OpenAI provider now uses Responses API (/v1/responses) instead of Chat Completions API. This is required for reasoning models (o1, o3, o4-mini). ## Major Changes ### Responses API Migration - **OpenAI Provider**: Completely rewritten to use `/v1/responses` endpoint - **OpenAI-Compatible Provider**: Supports both Responses API and Chat Completions API with automatic fallback - **Request Format**: Changed from `messages` array to `input` string + `instructions` - **Response Format**: Changed from `choices[0].message.content` to `output_text` ### Reasoning Model Support Added full support for OpenAI's reasoning models (o1, o3, o4-mini, GPT-5): 1. **Reasoning Effort Parameter**: Control thinking depth with "minimal", "low", "medium", "high" - `minimal`: Fast, basic reasoning (GPT-5 only) - `low`: Quick responses with light reasoning - `medium`: Balanced reasoning (recommended) - `high`: Deep reasoning for complex problems 2. **max_output_tokens Parameter**: New token limit parameter for Responses API - Replaces `max_tokens` for reasoning models - Falls back to `max_tokens` if not set for backward compatibility 3. **Automatic Model Detection**: OpenAI provider detects reasoning models and: - Disables temperature/top_p (not supported by reasoning models) - Enables reasoning_effort parameter - Uses proper token parameter names ### Configuration Updates **GenerationConfig** (crates/codegraph-ai/src/llm_provider.rs): ```rust pub struct GenerationConfig { pub temperature: f32, // Not supported by reasoning models pub max_tokens: Option<usize>, // Legacy parameter pub max_output_tokens: Option<usize>, // NEW: For Responses API pub reasoning_effort: Option<String>, // NEW: For reasoning models pub top_p: Option<f32>, // Not supported by reasoning models // ... } ``` **LLMConfig** (crates/codegraph-core/src/config_manager.rs): ```rust pub struct LLMConfig { pub max_tokens: usize, // Legacy pub max_output_tokens: Option<usize>, // NEW pub reasoning_effort: Option<String>, // NEW // ... } ``` ### Provider Implementations **OpenAI Provider** (crates/codegraph-ai/src/openai_llm_provider.rs): - Uses `/v1/responses` endpoint exclusively - Automatic reasoning model detection - Proper parameter handling based on model type - Request: `{ model, input, instructions, max_output_tokens, reasoning_effort }` - Response: `{ output_text, usage: { prompt_tokens, output_tokens, reasoning_tokens } }` **OpenAI-Compatible Provider** (crates/codegraph-ai/src/openai_compatible_provider.rs): - Defaults to Responses API (`use_responses_api: true`) - Falls back to Chat Completions API for compatibility - Supports both `max_output_tokens` and `max_completion_tokens` - Works with LM Studio, Ollama v1 endpoint, and custom APIs ### Documentation Updates **docs/CLOUD_PROVIDERS.md**: - Added "Responses API & Reasoning Models" section - Detailed explanation of API format differences - Configuration examples for reasoning models - Reasoning effort level descriptions - Migration guide from Chat Completions API **.codegraph.toml.example**: - Added `max_output_tokens` parameter with documentation - Added `reasoning_effort` parameter with options - Clarified which parameters apply to reasoning vs standard models ### Backward Compatibility - OpenAI-compatible provider can fall back to Chat Completions API - `max_output_tokens` falls back to `max_tokens` if not set - Configuration with only `max_tokens` continues to work - Standard models (gpt-4o, gpt-4-turbo) work as before ### Testing Added tests for: - Reasoning model detection (o1, o3, o4, gpt-5) - Standard model detection (gpt-4o, gpt-4-turbo) - OpenAI-compatible provider configuration - Both API format support ## Migration Guide ### For OpenAI Users **Before (Chat Completions API)**: ```toml [llm] provider = "openai" model = "gpt-4o" max_tokens = 4096 ``` **After (Responses API)** - Still works, but consider: ```toml [llm] provider = "openai" model = "gpt-4o" max_output_tokens = 4096 # Preferred for Responses API ``` **For Reasoning Models**: ```toml [llm] provider = "openai" model = "o3-mini" max_output_tokens = 25000 reasoning_effort = "medium" # NEW: Control reasoning depth # Note: temperature/top_p ignored for reasoning models ``` ### For OpenAI-Compatible Users No changes required - the provider automatically uses Responses API if available and falls back to Chat Completions API otherwise. To force Chat Completions API (e.g., for older systems): ```rust let config = OpenAICompatibleConfig { use_responses_api: false, // Force legacy API ... }; ``` ## Why This Change? 1. **Future-Proof**: Responses API is OpenAI's modern standard 2. **Reasoning Models**: Required for o1, o3, o4-mini support 3. **Better Features**: More granular control over model behavior 4. **Token Tracking**: Separate tracking of reasoning tokens 5. **Performance**: Optimized for latest models ## Files Modified - `crates/codegraph-ai/src/llm_provider.rs`: Added reasoning parameters to GenerationConfig - `crates/codegraph-ai/src/openai_llm_provider.rs`: Complete rewrite for Responses API - `crates/codegraph-ai/src/openai_compatible_provider.rs`: Dual API support - `crates/codegraph-core/src/config_manager.rs`: Added reasoning config fields - `.codegraph.toml.example`: Documented new parameters - `docs/CLOUD_PROVIDERS.md`: Comprehensive Responses API documentation ## References - OpenAI Responses API: https://platform.openai.com/docs/api-reference/responses - Reasoning Models: https://platform.openai.com/docs/guides/reasoning - Azure OpenAI Reasoning: https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/reasoning
1 parent 28f26d1 commit 623838b

File tree

6 files changed

+387
-113
lines changed

6 files changed

+387
-113
lines changed

.codegraph.toml.example

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,9 +86,19 @@ context_window = 32000
8686
# Temperature for generation (0.0 = deterministic, 2.0 = very creative)
8787
temperature = 0.1
8888

89-
# Maximum tokens to generate in responses
89+
# Maximum tokens to generate in responses (legacy parameter, use max_output_tokens for Responses API)
9090
max_tokens = 4096
9191

92+
# Maximum output tokens for Responses API and reasoning models
93+
# If not set, falls back to max_tokens
94+
# max_output_tokens = 4096
95+
96+
# Reasoning effort for reasoning models (o1, o3, o4-mini, GPT-5)
97+
# Options: "minimal", "low", "medium", "high"
98+
# Higher effort = more reasoning tokens = better quality but slower and more expensive
99+
# Only applies to reasoning models, ignored by standard models
100+
# reasoning_effort = "medium"
101+
92102
# Request timeout in seconds
93103
timeout_secs = 120
94104

crates/codegraph-ai/src/llm_provider.rs

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -25,15 +25,19 @@ pub struct ProviderCharacteristics {
2525
/// Configuration for generation parameters
2626
#[derive(Debug, Clone, Serialize, Deserialize)]
2727
pub struct GenerationConfig {
28-
/// Temperature for sampling (0.0 to 2.0)
28+
/// Temperature for sampling (0.0 to 2.0) - Not supported by reasoning models
2929
pub temperature: f32,
30-
/// Maximum tokens to generate
30+
/// Maximum tokens to generate (legacy parameter for Chat Completions API)
3131
pub max_tokens: Option<usize>,
32-
/// Top-p nucleus sampling parameter
32+
/// Maximum output tokens (for Responses API and reasoning models)
33+
pub max_output_tokens: Option<usize>,
34+
/// Reasoning effort for reasoning models: "minimal", "low", "medium", "high"
35+
pub reasoning_effort: Option<String>,
36+
/// Top-p nucleus sampling parameter - Not supported by reasoning models
3337
pub top_p: Option<f32>,
34-
/// Frequency penalty (-2.0 to 2.0)
38+
/// Frequency penalty (-2.0 to 2.0) - Not supported by reasoning models
3539
pub frequency_penalty: Option<f32>,
36-
/// Presence penalty (-2.0 to 2.0)
40+
/// Presence penalty (-2.0 to 2.0) - Not supported by reasoning models
3741
pub presence_penalty: Option<f32>,
3842
/// Stop sequences
3943
pub stop: Option<Vec<String>>,
@@ -44,6 +48,8 @@ impl Default for GenerationConfig {
4448
Self {
4549
temperature: 0.1,
4650
max_tokens: Some(4096),
51+
max_output_tokens: None, // Will use max_tokens if not set
52+
reasoning_effort: None, // Only for reasoning models
4753
top_p: None,
4854
frequency_penalty: None,
4955
presence_penalty: None,

0 commit comments

Comments
 (0)