Skip to content

Commit ad6e799

Browse files
Add prompt caching support for AWS Bedrock (#3438)
1 parent 25d360c commit ad6e799

File tree

9 files changed

+1198
-45
lines changed

9 files changed

+1198
-45
lines changed

docs/models/bedrock.md

Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,175 @@ model = BedrockConverseModel(model_name='us.amazon.nova-pro-v1:0')
7474
agent = Agent(model=model, model_settings=bedrock_model_settings)
7575
```
7676

77+
## Prompt Caching
78+
79+
Bedrock supports [prompt caching](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html) on Anthropic models so you can reuse expensive context across requests. Pydantic AI provides four ways to use prompt caching:
80+
81+
1. **Cache User Messages with [`CachePoint`][pydantic_ai.messages.CachePoint]**: Insert a `CachePoint` marker to cache everything before it in the current user message.
82+
2. **Cache System Instructions**: Enable [`BedrockModelSettings.bedrock_cache_instructions`][pydantic_ai.models.bedrock.BedrockModelSettings.bedrock_cache_instructions] to append a cache point after the system prompt.
83+
3. **Cache Tool Definitions**: Enable [`BedrockModelSettings.bedrock_cache_tool_definitions`][pydantic_ai.models.bedrock.BedrockModelSettings.bedrock_cache_tool_definitions] to cache your tool schemas.
84+
4. **Cache All Messages**: Set [`BedrockModelSettings.bedrock_cache_messages`][pydantic_ai.models.bedrock.BedrockModelSettings.bedrock_cache_messages] to `True` to automatically cache the last user message.
85+
86+
!!! note "No TTL Support"
87+
Unlike the direct Anthropic API, Bedrock manages cache TTL automatically. All cache settings are boolean only — no `'5m'` or `'1h'` options.
88+
89+
!!! note "Minimum Token Threshold"
90+
AWS only serves cached content once a segment crosses the provider-specific minimum token thresholds (see the [Bedrock prompt caching docs](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html)). Short prompts or tool definitions below those limits will bypass the cache, so don't expect savings for tiny payloads.
91+
92+
### Example 1: Automatic Message Caching
93+
94+
Use `bedrock_cache_messages` to automatically cache the last user message:
95+
96+
```python {test="skip"}
97+
from pydantic_ai import Agent
98+
from pydantic_ai.models.bedrock import BedrockModelSettings
99+
100+
agent = Agent(
101+
'bedrock:us.anthropic.claude-sonnet-4-5-20250929-v1:0',
102+
system_prompt='You are a helpful assistant.',
103+
model_settings=BedrockModelSettings(
104+
bedrock_cache_messages=True, # Automatically caches the last message
105+
),
106+
)
107+
108+
# The last message is automatically cached - no need for manual CachePoint
109+
result1 = agent.run_sync('What is the capital of France?')
110+
111+
# Subsequent calls with similar conversation benefit from cache
112+
result2 = agent.run_sync('What is the capital of Germany?')
113+
print(f'Cache write: {result1.usage().cache_write_tokens}')
114+
print(f'Cache read: {result2.usage().cache_read_tokens}')
115+
```
116+
117+
### Example 2: Comprehensive Caching Strategy
118+
119+
Combine multiple cache settings for maximum savings:
120+
121+
```python {test="skip"}
122+
from pydantic_ai import Agent, RunContext
123+
from pydantic_ai.models.bedrock import BedrockConverseModel, BedrockModelSettings
124+
125+
model = BedrockConverseModel('us.anthropic.claude-sonnet-4-5-20250929-v1:0')
126+
agent = Agent(
127+
model,
128+
system_prompt='Detailed instructions...',
129+
model_settings=BedrockModelSettings(
130+
bedrock_cache_instructions=True, # Cache system instructions
131+
bedrock_cache_tool_definitions=True, # Cache tool definitions
132+
bedrock_cache_messages=True, # Also cache the last message
133+
),
134+
)
135+
136+
137+
@agent.tool
138+
def search_docs(ctx: RunContext, query: str) -> str:
139+
"""Search documentation."""
140+
return f'Results for {query}'
141+
142+
143+
result = agent.run_sync('Search for Python best practices')
144+
print(result.output)
145+
```
146+
147+
### Example 3: Fine-Grained Control with CachePoint
148+
149+
Use manual `CachePoint` markers to control cache locations precisely:
150+
151+
```python {test="skip"}
152+
from pydantic_ai import Agent, CachePoint
153+
154+
agent = Agent(
155+
'bedrock:us.anthropic.claude-sonnet-4-5-20250929-v1:0',
156+
system_prompt='Instructions...',
157+
)
158+
159+
# Manually control cache points for specific content blocks
160+
result = agent.run_sync([
161+
'Long context from documentation...',
162+
CachePoint(), # Cache everything up to this point
163+
'First question'
164+
])
165+
print(result.output)
166+
```
167+
168+
### Accessing Cache Usage Statistics
169+
170+
Access cache usage statistics via [`RequestUsage`][pydantic_ai.usage.RequestUsage]:
171+
172+
```python {test="skip"}
173+
from pydantic_ai import Agent, CachePoint
174+
175+
agent = Agent('bedrock:us.anthropic.claude-sonnet-4-5-20250929-v1:0')
176+
177+
178+
async def main():
179+
result = await agent.run(
180+
[
181+
'Reference material...',
182+
CachePoint(),
183+
'What changed since last time?',
184+
]
185+
)
186+
usage = result.usage()
187+
print(f'Cache writes: {usage.cache_write_tokens}')
188+
print(f'Cache reads: {usage.cache_read_tokens}')
189+
```
190+
191+
### Cache Point Limits
192+
193+
Bedrock enforces a maximum of 4 cache points per request. Pydantic AI automatically manages this limit to ensure your requests always comply without errors.
194+
195+
#### How Cache Points Are Allocated
196+
197+
Cache points can be placed in three locations:
198+
199+
1. **System Prompt**: Via `bedrock_cache_instructions` setting (adds cache point to last system prompt block)
200+
2. **Tool Definitions**: Via `bedrock_cache_tool_definitions` setting (adds cache point to last tool definition)
201+
3. **Messages**: Via `CachePoint` markers or `bedrock_cache_messages` setting (adds cache points to message content)
202+
203+
Each setting uses **at most 1 cache point**, but you can combine them.
204+
205+
#### Automatic Cache Point Limiting
206+
207+
When cache points from all sources (settings + `CachePoint` markers) exceed 4, Pydantic AI automatically removes excess cache points from **older message content** (keeping the most recent ones).
208+
209+
```python {test="skip"}
210+
from pydantic_ai import Agent, CachePoint
211+
from pydantic_ai.models.bedrock import BedrockModelSettings
212+
213+
agent = Agent(
214+
'bedrock:us.anthropic.claude-sonnet-4-5-20250929-v1:0',
215+
system_prompt='Instructions...',
216+
model_settings=BedrockModelSettings(
217+
bedrock_cache_instructions=True, # 1 cache point
218+
bedrock_cache_tool_definitions=True, # 1 cache point
219+
),
220+
)
221+
222+
@agent.tool_plain
223+
def search() -> str:
224+
return 'data'
225+
226+
227+
# Already using 2 cache points (instructions + tools)
228+
# Can add 2 more CachePoint markers (4 total limit)
229+
result = agent.run_sync([
230+
'Context 1', CachePoint(), # Oldest - will be removed
231+
'Context 2', CachePoint(), # Will be kept (3rd point)
232+
'Context 3', CachePoint(), # Will be kept (4th point)
233+
'Question'
234+
])
235+
# Final cache points: instructions + tools + Context 2 + Context 3 = 4
236+
print(result.output)
237+
```
238+
239+
**Key Points**:
240+
241+
- System and tool cache points are **always preserved**
242+
- The cache point created by `bedrock_cache_messages` is **always preserved** (as it's the newest message cache point)
243+
- Additional `CachePoint` markers in messages are removed from oldest to newest when the limit is exceeded
244+
- This ensures critical caching (instructions/tools) is maintained while still benefiting from message-level caching
245+
77246
## `provider` argument
78247

79248
You can provide a custom `BedrockProvider` via the `provider` argument. This is useful when you want to specify credentials directly or use a custom boto3 client:

pydantic_ai_slim/pydantic_ai/messages.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -646,6 +646,7 @@ class CachePoint:
646646
Supported by:
647647
648648
- Anthropic
649+
- Amazon Bedrock (Converse API)
649650
"""
650651

651652
kind: Literal['cache-point'] = 'cache-point'

0 commit comments

Comments
 (0)