Why Did Anthropic’s Cache TTL Change Blow Through My Claude Quota?
# Why Did Anthropic’s Cache TTL Change Blow Through My Claude Quota?
Yes—the reported prompt cache TTL change from 1 hour to 5 minutes can absolutely burn through your Claude quota faster, even if your app didn’t change. By forcing many more expensive cache writes (instead of mostly cheap cache reads) during normal “pause-and-resume” interactive use, the shorter default TTL can inflate billed usage and accelerate quota depletion—exactly what multiple developers said they experienced after early March 2026.
What server-side prompt caching (cache_read) actually is
Anthropic’s prompt caching (for Claude) is a server-side cache for reusable context—often described in community analyses as “thinking blocks” such as project documentation, code context, or conversation history. The point is to avoid resending or rebuilding the same large context for every request.
From a billing and quota perspective, the crucial detail is that prompt caching isn’t a single action. It breaks into two economically different operations:
- Cache writes: You store the block on Anthropic’s servers for a set TTL (time-to-live). These writes are the “expensive” part of using the cache.
- Cache reads: Later requests can reuse that stored block. Reads are much cheaper, and the sources note that cached blocks still count as input tokens in usage metrics—just at far lower billed impact than rewriting the cache.
In other words: caching saves money when your workflow does a small number of writes and lots of reads. It becomes costly when you’re forced into repeated writes.
(For broader context on how infrastructure constraints can change developer economics, see Edge AI Booms as Frontier Access Tightens.)
How TTL controls billing—and why a shorter default matters
TTL is the expiration window for a cached block. If the cached block expires, the next request can’t “read” it; it must write it again.
Community reporting and analyses describe two relevant TTL/write-price tiers:
- 5-minute TTL writes (reported example pricing: $3.75 per MToken)
- 1-hour TTL writes (reported example pricing: $6.00 per MToken)
- Cache reads (reported: $0.30 per MToken)
The exact prices matter less than the shape of the problem: reads are far cheaper than writes, and TTL determines how often you’re forced to write.
With a 1-hour default TTL, a typical interactive workflow—say, using Claude Code, an editor integration, or a chat-based REPL—might write the big context once, then do many low-cost reads as you iterate. With a 5-minute TTL, any natural pause (a meeting, a coffee, a context switch, or even stepping away to test code) longer than five minutes can cause the cache to expire. The next interaction then triggers another expensive write.
That’s how a default TTL shortening can blow through quota: it increases the frequency of expensive events per session, even when the user experience looks identical.
The timeline and evidence: what happened in March–April 2026
Multiple independent sources—including GitHub issue #46829, a ByteIOTA analysis, and a Hacker News thread—report that Anthropic changed Claude’s default prompt cache TTL from 1 hour to 5 minutes around March 6–8, 2026, and did so without a public announcement, changelog entry, or email to developers.
Developers didn’t notice because of a new feature—they noticed because of billing and quota anomalies:
- Reports of quotas emptying dramatically faster (one example given: a “5-hour” quota depleting in about 19 minutes).
- Sudden, unexplained cost increases in interactive workflows.
Public discussion and deeper analysis surfaced April 11–13, 2026, after people examined logs and datasets. One frequently cited evidence type is Claude Code session JSONL logs across Jan–Apr 2026, which reportedly show a clear behavioral shift consistent with a TTL boundary moving from one hour to five minutes.
On quantified impact, the shared analyses cited in the brief include:
- A jump in cache overhead from about 1.1% (February, pre-change) to about 25.9% (March, post-change) in one dataset.
- Estimated cost inflation in the range of roughly 17–32%.
- One developer’s documented $2,530 in excess spending across 119,866 API calls over roughly four months, attributed to the regression.
Some community commentary speculates the TTL change may have preceded—or been related to—Anthropic’s March 26 announcement about peak-hour throttling. But the sources are clear on one point: there is no official confirmation in the materials provided that explains why the TTL default changed.
How to detect the TTL change in your own usage
If you suspect you were hit, the goal is to confirm whether you experienced a step-change from mostly reads to frequent rewrites.
Practical checks based on the community analyses:
- Track cache writes vs. cache reads over time. A sudden spike in writes—or a big change in the write/read ratio—suggests more expirations and rewrites.
- Compare per-call token accounting and billed cost before vs. after early March 2026. If your client behavior stayed stable but billed input/caching costs stepped up, that’s consistent with shorter TTL forcing rewrites.
- Inspect client logs for cache-related fields. The brief notes that Claude Code JSONL logs include cache events; correlate timestamps to see whether “expiration boundaries” start clustering around ~5 minutes instead of ~1 hour.
This kind of log-based proof is also what you’ll need if you plan to ask support for reconciliation.
Practical mitigations to limit cost and quota shocks
You can’t rely on server defaults staying constant, so mitigations tend to focus on reducing the number of distinct writes and improving observability:
- Amortize rewrites by consolidating hot context. If you can move frequently reused context into fewer, larger cached blocks, each forced rewrite buys you more reuse (fewer cache keys, fewer write events).
- Use client-side strategies for very short interactive loops. If your pattern is “send, wait a bit, send again,” minimizing server-side write churn can help; the sources frame this as avoiding a write on every resume when pauses are common.
- Add monitoring and alerts specifically for caching. Track cache writes per minute, token input spikes, and projected spend; add budget alarms and tighter rate limits for interactive tooling so one backend change can’t run away.
- Ask for explicit TTL configuration if available—and contact support. The brief notes developers requesting clearer per-call cache control and logging incidents for billing reconciliation.
For more operational tactics teams are applying to fast-moving platform shifts, see From RustFS Speedups to Agentic Blender: 7 fresh tech beats to watch.
Why It Matters Now
This matters now because the reported change is a case study in a broader reliability problem for AI developers: billing-sensitive defaults can be just as disruptive as outages. Here, teams building interactive experiences—code assistants, chat-ops, REPL-like tools—reported sudden quota exhaustion and unplanned cost increases tied to a backend TTL behavior change.
It also lands during a period when the community was already scrutinizing platform stability and throttling (including Anthropic’s March 26 peak-hour throttling announcement, referenced in community speculation). In that context, an unannounced caching-default shift amplified concerns about change management: when defaults materially affect spend, developers expect communication, changelogs, and ideally opt-in/opt-out controls.
What to Watch
- Whether Anthropic publishes an announcement or changelog clarification about default TTL behavior, a rollback, or an opt-in TTL control.
- How Anthropic responds to community documentation, including GitHub issue #46829, and whether follow-up analyses (including ByteIOTA) update their findings.
- Your own metrics: do cache-write spikes subside, does cache overhead return toward pre-March levels, and do you receive any credits/compensation after filing a detailed support ticket with logs.
Sources: https://github.com/anthropics/claude-code/issues/46829, https://byteiota.com/anthropic-cache-ttl-downgrade-silent-2-5k-cost-spike/, https://news.ycombinator.com/item?id=47736476, https://upstract.com/x/18c577b64ff526fc, https://gu-log.vercel.app/en/posts/en-sp-112-20260313-anthropic-prompt-caching-2026-update, https://platform.claude.com/docs/en/build-with-claude/prompt-caching
About the Author
yrzhe
AI Product Thinker & Builder. Curating and analyzing tech news at TechScan AI. Follow @yrzhe_top on X for daily tech insights and commentary.