You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
***Balanced Mode** (default): Credentials sorted by usage count - least-used first for even distribution
118
+
***Sequential Mode**: Credentials sorted by usage count descending - most-used first to maintain sticky behavior until exhausted
119
+
3.**Tiering**: Valid keys are split into two tiers:
112
120
***Tier 1 (Ideal)**: Keys that are completely idle (0 concurrent requests).
113
121
***Tier 2 (Acceptable)**: Keys that are busy but still under their configured `MAX_CONCURRENT_REQUESTS_PER_KEY_<PROVIDER>` limit for the requested model. This allows a single key to be used multiple times for the same model, maximizing throughput.
114
-
3.**Selection Strategy** (configurable via `rotation_tolerance`):
122
+
4.**Selection Strategy** (configurable via `rotation_tolerance`):
115
123
***Deterministic (tolerance=0.0)**: Within each tier, keys are sorted by daily usage count and the least-used key is always selected. This provides perfect load balance but predictable patterns.
116
124
***Weighted Random (tolerance>0, default)**: Keys are selected randomly with weights biased toward less-used ones:
-`tolerance=2.0` (recommended): Balanced randomness - credentials within 2 uses of the maximum can still be selected with reasonable probability
119
127
-`tolerance=5.0+`: High randomness - even heavily-used credentials have significant probability
120
128
-**Security Benefit**: Unpredictable selection patterns make rate limit detection and fingerprinting harder
121
129
-**Load Balance**: Lower-usage credentials still preferred, maintaining reasonable distribution
122
-
4.**Concurrency Limits**: Checks against `max_concurrent` limits to prevent overloading a single key.
123
-
5.**Priority Groups**: When credential prioritization is enabled, higher-tier credentials (lower priority numbers) are tried first before moving to lower tiers.
130
+
5.**Concurrency Limits**: Checks against `max_concurrent` limits (with priority multipliers applied) to prevent overloading a single key.
131
+
6.**Priority Groups**: When credential prioritization is enabled, higher-tier credentials (lower priority numbers) are tried first before moving to lower tiers.
124
132
125
133
#### Failure Handling & Cooldowns
126
134
127
135
***Escalating Backoff**: When a failure occurs, the key gets a temporary cooldown for that specific model. Consecutive failures increase this time (10s -> 30s -> 60s -> 120s).
128
136
***Key-Level Lockouts**: If a key accumulates failures across multiple distinct models (3+), it is assumed to be dead/revoked and placed on a global 5-minute lockout.
129
137
***Authentication Errors**: Immediate 5-minute global lockout.
138
+
***Quota Exhausted Errors**: When a provider returns a quota exhausted error with an authoritative reset timestamp:
139
+
- The `quota_reset_ts` is extracted from the error response (via provider's `parse_quota_error()` method)
140
+
- Applied to the affected model (and all models in its quota group if defined)
141
+
- Cooldown preserved even during daily/window resets until the actual quota reset time
142
+
- Logs show the exact reset time in local timezone with ISO format
@@ -406,6 +419,10 @@ The most sophisticated provider implementation, supporting Google's internal Ant
406
419
-**Thought Signature Caching**: Server-side caching of encrypted signatures for multi-turn Gemini 3 conversations
407
420
-**Model-Specific Logic**: Automatic configuration based on model type (Gemini 3, Claude Sonnet, Claude Opus)
408
421
-**Credential Prioritization**: Automatic tier detection with paid credentials prioritized over free (paid tier resets every 5 hours, free tier resets weekly)
422
+
-**Sequential Rotation Mode**: Default rotation mode is sequential (use credentials until exhausted) to maximize thought signature cache hits
423
+
-**Per-Model Quota Tracking**: Each model tracks independent usage windows with authoritative reset timestamps from quota errors
424
+
-**Quota Groups**: Claude models (Sonnet 4.5 + Opus 4.5) can be grouped to share quota limits (disabled by default, configurable via `QUOTA_GROUPS_ANTIGRAVITY_CLAUDE`)
- Paid credentials handle more load without manual configuration
767
+
- Different concurrency for different rotation modes
768
+
- Automatic tier detection based on credential properties
769
+
770
+
#### Reset Window Configuration
771
+
772
+
Providers can specify custom reset windows per priority tier:
773
+
774
+
```python
775
+
classAntigravityProvider(ProviderInterface):
776
+
usage_reset_configs = {
777
+
frozenset([1, 2]): UsageResetConfigDef(
778
+
mode="per_model",
779
+
window_hours=5, # 5-hour rolling window for paid tiers
780
+
field_name="5h_window"
781
+
),
782
+
frozenset([3, 4, 5]): UsageResetConfigDef(
783
+
mode="per_model",
784
+
window_hours=168, # 7-day window for free tier
785
+
field_name="7d_window"
786
+
)
787
+
}
788
+
```
789
+
790
+
**Supported Modes**:
791
+
-`per_model`: Independent window per model with authoritative reset times
792
+
-`credential`: Single window per credential (legacy)
793
+
-`daily`: Daily reset at configured UTC hour (legacy)
794
+
795
+
#### Usage Flow
796
+
797
+
1.**Request arrives** for model X with credential Y
798
+
2.**Check rotation mode**: Sequential or balanced?
799
+
3.**Select credential**:
800
+
- Filter by priority tier requirements
801
+
- Apply concurrency multiplier for effective limit
802
+
- Sort by rotation mode strategy
803
+
4.**Check quota**:
804
+
- Load model's usage data
805
+
- Check if within window (window_start_ts to quota_reset_ts)
806
+
- Check model quota groups for combined usage
807
+
5.**Execute request**
808
+
6.**On success**: Increment model usage count
809
+
7.**On quota error**:
810
+
- Parse error for `quota_reset_ts`
811
+
- Apply to model (and quota group)
812
+
- Credential remains on cooldown until reset time
813
+
8.**On window expiration**:
814
+
- Archive model data to global stats
815
+
- Start fresh window with new `window_start_ts`
816
+
- Preserve unexpired quota cooldowns
817
+
818
+
---
819
+
588
820
### 2.12. Google OAuth Base (`providers/google_oauth_base.py`)
589
821
590
822
A refactored, reusable OAuth2 base class that eliminates code duplication across Google-based providers.
@@ -637,6 +869,12 @@ The library handles provider idiosyncrasies through specialized "Provider" class
637
869
638
870
The `GeminiCliProvider` is the most complex implementation, mimicking the Google Cloud Code extension.
639
871
872
+
**New in PR #31**:
873
+
-**Quota Parsing**: Implements `parse_quota_error()` using Google RPC format parser
874
+
-**Tier Configuration**: Defines `tier_priorities` and `usage_reset_configs` for automatic priority resolution
875
+
-**Balanced Rotation**: Defaults to balanced mode (unlike Antigravity which uses sequential)
876
+
-**Priority Multipliers**: Same as Antigravity (P1: 5x, P2: 3x, others: 1x)
877
+
640
878
#### Authentication (`gemini_auth_base.py`)
641
879
642
880
***Device Flow**: Uses a standard OAuth 2.0 flow. The `credential_tool` spins up a local web server (`localhost:8085`) to capture the callback from Google's auth page.
0 commit comments