Prompt Cache Optimization — Implementation Summary¶
Context¶
ClawSeed's multi-turn conversations with Anthropic had ~0% cache hit rate because the system prompt changes every turn (datetime with second precision at position 0), and it was sent as a single SystemBlock. Anthropic's prompt caching relies on exact prefix matching — any change at any position breaks the entire cache chain. This wastes ~2–4 KB / 1–3k tokens per turn in multi-turn conversations.
Root Cause¶
Three fatal cache-breaking issues:
DateTimeSectionat position 0 — content changes every second, diverging at the very start of the prefix- Single
SystemBlock— any character change forces reprocessing of the entire block - Full rebuild on stable changes — Core memory / skill / tool changes replace the entire system message content
Implementation¶
Phase 0 → Minute Precision (superseded by Phase 2)¶
Initially reduced DateTimeSection::build() from second-precision to minute-precision. This improved cache hits within the same minute, but the system prompt still changed every minute — insufficient for long sessions.
Phase 1 — Stable/Dynamic Partitioning (superseded by Phase 2)¶
Introduced CacheClass (Stable/Dynamic) and PartitionedSystemPrompt { stable, dynamic, full } to split the system prompt into a cacheable prefix and a per-turn dynamic suffix. DateTimeSection was marked as Dynamic and moved to the end, with a preamble bridge (⚠️ THE CURRENT TIME BELOW APPLIES TO ALL ABOVE INSTRUCTIONS.) appended to the stable block.
This achieved Anthropic prefix caching, but required per-turn dynamic rebuilds and added complexity (preamble, split logic, dynamic_system_content field, refresh_dynamic_system_content() method).
Phase 2 — Full Stability (current implementation)¶
Key insight: If the system prompt is 100% stable across turns (zero per-turn changes), automatic prefix caching works without any message-level transformation. Only Anthropic and Bedrock need explicit cache_control: ephemeral markers; all other providers benefit from the stable prefix automatically.
1. Remove DateTimeSection from system prompt (prompt.rs)¶
DateTimeSection is no longer included in SystemPromptBuilder::with_defaults(). Current time is provided via the user message timestamp prefix instead:
The gateway and CLI both prepend this [YYYY-MM-DD HH:MM:SS TZ] prefix to every user message before sending it to the agent. This keeps time context available to the model without injecting it into the system prompt.
Benefits:
- The entire system message is byte-identical across all turns → 100% stable prefix
- Works for all providers with automatic prefix caching (DeepSeek, OpenAI, Groq, etc.)
- No per-turn rebuild cost — eliminates refresh_dynamic_system_content(), build_dynamic_system_content(), and build_dynamic()
2. Simplify PartitionedSystemPrompt (prompt.rs)¶
With no Dynamic sections, PartitionedSystemPrompt simplifies:
pub struct PartitionedSystemPrompt {
pub stable: String, // Full system prompt content (all sections)
pub dynamic: String, // Always empty — no Dynamic sections currently exist
pub full: String, // Equals stable when dynamic is empty
}
build_partitioned()no longer appends the preamble — it's removed (DYNAMIC_PREAMBLEconstant deleted)- The
elsebranch (stable + dynamic concatenation) is retained for future dynamic sections but currently never executed build_dynamic()method removed — no dynamic sections to build separately
3. Remove dynamic content from Agent (agent.rs)¶
Removed fields and methods:
- dynamic_system_content field — no longer needed
- refresh_dynamic_system_content() — no dynamic content to refresh per-turn
- build_dynamic_system_content() — no dynamic sections to build
Agent now only has stable_system_content — the full system prompt content, which is rebuilt only when stable content changes (Core memory updates, skill activation/deactivation, tool changes).
4. CacheStrategy enum replaces prompt_caching: bool (clawseed-api/src/provider.rs)¶
#[derive(Debug, Clone, Copy, Default, PartialEq, Eq)]
pub enum CacheStrategy {
/// No explicit caching. Automatic prefix caching works because
/// the entire system prompt is stable.
#[default]
None,
/// Anthropic-style explicit `cache_control: ephemeral` markers or
/// Bedrock-style `CachePoint` blocks within system messages.
ExplicitAnthropic,
}
pub struct ProviderCapabilities {
pub native_tool_calling: bool,
pub vision: bool,
pub cache_strategy: CacheStrategy, // Was: prompt_caching: bool
}
Provider assignments:
| Provider | CacheStrategy | Reason |
|---|---|---|
| Anthropic | ExplicitAnthropic |
Requires cache_control: ephemeral markers on system message blocks |
| Bedrock | ExplicitAnthropic |
Requires CachePoint blocks within system messages |
| OpenAI-compatible (DeepSeek, Groq, Ollama, etc.) | None |
Automatic server-side prefix caching works with stable prompts |
| Gemini | None |
No explicit cache markers needed |
The CacheStrategy::None default means new providers automatically get correct behavior — they benefit from the stable system prompt without needing explicit cache markers.
5. DeepSeek Anthropic-compatible endpoint (factory.rs)¶
New DeepSeekAnthropicFactory wraps AnthropicProvider with DeepSeek's Anthropic-compatible base URL (https://api.deepseek.com/anthropic). This endpoint supports cache_control: ephemeral markers, giving DeepSeek users explicit prompt caching the same way Anthropic users get it.
- Provider name:
deepseek-anthropic(aliases:deepseek-claude) - Uses
AnthropicProvider::with_base_url()— same conversion logic, samestable_prefixhandling, samecache_controlinjection - Registered alongside other factories in
default_provider_factory_registry()
Why: DeepSeek's OpenAI-compatible endpoint (/v1/chat/completions) only supports automatic prefix caching. The /anthropic endpoint supports explicit cache_control, giving finer control and guaranteed cache hits for Anthropic-style clients.
6. Cached input tokens parsing (compatible/parsing.rs, provider_impl.rs)¶
TokenUsage.cached_input_tokens is now populated from provider-specific response fields:
- DeepSeek (
/v1/chat/completions):prompt_cache_hit_tokensfield - OpenAI:
prompt_tokens_details.cached_tokenssub-field - Extraction via
UsageInfo::extract_cached_tokens()helper method (shared betweenchat()andstream_chat()paths)
impl UsageInfo {
pub(super) fn extract_cached_tokens(&self) -> Option<u64> {
self.prompt_cache_hit_tokens
.or_else(|| self.prompt_tokens_details.as_ref()?.cached_tokens)
}
}
Anthropic / Bedrock Integration (unchanged from Phase 1)¶
Anthropic and Bedrock providers still use stable_prefix to split system messages into cacheable blocks:
- Anthropic:
SystemPrompt::Blocks([stable_block(cache_control: ephemeral), dynamic_block(no_cache)]) - Bedrock:
SystemBlock::Text(stable)+CachePoint+SystemBlock::Text(dynamic)
Since dynamic is always empty now, the "dynamic block" is effectively empty or absent. The stable block contains the entire system prompt with a single cache_control marker, which Anthropic caches as a whole.
Cache Breakpoint Budget¶
Anthropic caps at 4 breakpoints per request. Phase 2 does not increase the count:
| Position | Before | Phase 2 |
|---|---|---|
| OAuth prefix block | 0 or 1 | 0 or 1 |
| System prompt | 1 (single block) | 1 (entire prompt with cache_control: ephemeral) |
| Last conversation message | 0 or 1 | 0 or 1 |
| Tool results | 0 or 1 | 0 or 1 |
| Max total | 4 | 4 |
Known Limitations¶
-
Stable block rebuilds break cache that turn. Triggered by:
memory_storeadding Core memory, skill activation/deactivation, remote tool registration. The next turn re-caches. In steady sessions this is rare. -
Minimum cacheable prefix is 1024 tokens (Sonnet/Opus). Compact configurations (minimal personality, no skills, no Core memories) may fall below the threshold and won't be cached.
-
Provider coverage: Anthropic + Bedrock use
CacheStrategy::ExplicitAnthropic(explicit markers). DeepSeek-anthropic endpoint also supports explicit markers. All other providers useCacheStrategy::None(automatic prefix caching via stable prompts). Server-side implicit caching on OpenAI/DeepSeek/Groq benefits from the fully stable system prompt. -
Time context: No longer in the system prompt. The
[YYYY-MM-DD HH:MM:SS TZ]prefix on each user message provides time context. This means: - The model knows the current time on each turn from the user message
- The time is not cached (changes each turn) but only adds ~30 bytes to the user message, not to the system prompt
-
Tasks requiring exact timestamps can use tool calls (e.g.,
shell_exec date) -
Prompt-guided tool injection: The default Provider
chat()method appends tool instructions to systemcontentwhennative_tool_calling: false. If the system message hasstable_prefix: Some(...), appending tocontentbreaks the partition invariant. Currently no provider hasnative_tool_calling: falseANDCacheStrategy::ExplicitAnthropic, so this does not arise.
Expected Behavior¶
| Turn | System Prompt Shape | Cache Result |
|---|---|---|
| Turn 1 | [entire_prompt(cache_control: ephemeral)] (Anthropic/Bedrock) or [entire_prompt] (others) |
Full system processed; cached if ≥1024 tokens |
| Turn 2 | Same system prompt (byte-identical), user message with updated timestamp prefix | Stable prefix matches → cache hit on all providers |
| Turn N (no stable change) | Same | Cache hit every turn (Anthropic within 5-min TTL, others via server-side prefix cache) |
| Stable change (skill/memory/tool) | New system prompt content | Cache miss for that turn; new cache established for subsequent turns |
Estimated savings: Stable input tokens billed at ~10% of normal on Anthropic cache hit. For a typical 3k-token stable prefix, steady-state cost reduction on system tokens is ~90% (entire prompt is cached, not just a portion). Other providers benefit from server-side implicit prefix caching at no extra cost.
Verification¶
cargo test -p clawseed-agent— system prompt has no datetime section, all sections are Stable, partitioned build with empty dynamiccargo test -p clawseed-api— ChatMessage serde roundtrip, system_partitionedcargo test -p clawseed-providers— Anthropic/Bedrock partitioned conversion, DeepSeekAnthropicFactorycargo build— full workspace compiles./tools/ci_local.sh— fmt/clippy/test pass- Manual:
clawseed chatagainst Anthropic, 2 turns → Turn 2cache_read_input_tokens > 0 - Manual:
clawseed chatagainst DeepSeek-anthropic →cache_read_input_tokens > 0