OpenClaw 6.1 Cache Hit Rate Regression Drives LLM Token Costs Up

Summary

With the release of OpenClaw 6.1 / 2026.6.1, a severe regression in the prompt caching system has come to light. Developers report that prompt cache hit rates plummeted from approximately 80% to under 20% post-upgrade. This regression forces identical system prompts and tool definitions to be recomputed constantly. As a result, token consumption and API costs have increased 4-5x, particularly when using the Anthropic Vertex AI provider.

What happened?

Following their upgrade to OpenClaw 6.1, developers observed unexpected surges in their LLM API bills. Diagnostic traces using cache-trace and the /status command revealed a massive drop in cache efficiency. The root cause is a caching protocol mismatch: the Anthropic Vertex provider incorrectly injects cache_control: ephemeral into dynamic suffixes and active memory blocks, which invalidates prefix-based cache matching and forces the model provider to recompute the entire context for each step.

Why it matters

Prompt caching is vital for keeping agentic workflows financially viable. AI agents repeatedly transmit system instructions, execution history, and tool definitions across multiple steps. Without functioning cache mechanisms, operational costs multiply rapidly. A drop in the hit rate to under 20% makes complex agent loops economically unsustainable for many businesses and adds significant latency (TTFT - Time-to-First-Token).

Evidence

Bug reports have surged across developer channels. GitHub issue #90583 documents the cache hit rate drop from 80% to under 20% in detail. Furthermore, GitHub issue #91982 details how the @openclaw/anthropic-vertex-provider sends incorrect cache_control headers to the StreamRawPredict endpoint, triggering a 400 Bad Request error once the API limit of 4 caching blocks is exceeded (“A maximum of 4 blocks with cache_control may be provided. Found 5.”).

Analysis

The bug originates in applyAnthropicCacheControlToSystem within src/agents/anthropic-payload-policy.ts. This function incorrectly applies cache_control: ephemeral to the dynamic suffix of the system prompt. When the active-memory feature is enabled and recalled memories are prepended to the user message via prependContext, the total number of cache-controlled blocks exceeds Anthropic’s hard limit of 4. Additionally, placing caching markers inside dynamic context portions invalidates the cache key on minor variations, neutralizing prefix caching entirely.

Practical Takeaways

Disable Active Memory: As a temporary workaround, disable the active-memory feature in openclaw.json. This prevents the 400 Bad Request error and restores prefix caching for static components.
Perform a Rollback: Developers heavily dependent on cost-efficient caching should temporarily downgrade to version 2026.4.20 until a payload-policy hotfix is released.
Monitor Cache Health: Actively monitor cache performance using the /status command and inspect cache mismatches with cache-trace.

Open Questions

It remains to be seen whether other providers (such as OpenAI or the native Anthropic API wrapper) suffer from similar payload formatting errors in version 6.1, or if the regression is isolated to the Anthropic Vertex provider and the Active Memory pipeline. A formal patch from the OpenClaw maintainers is still pending.