Enable cross-session Claude prompt caching
TL;DR
System prompts, skills, and memory now cache for one hour across sessions when using Claude. This reduces latency and costs for new sessions and background memory reviews.
What changed
Hermes Agent added cross-session prompt caching for Claude models. System prompts, skills, and memory now stay cached for one hour across separate sessions. The update targets latency and token costs during restarts or background memory checks.
The change applies automatically when users select Claude through the /model command or OpenRouter routes. No extra configuration is required beyond existing Claude API keys.
Why it matters
Persistent memory in Hermes Agent becomes cheaper to maintain over time. Vibe Builders who run background reviews or restart chats frequently will see lower bills and faster responses.
This move strengthens Hermes Agent against hosted alternatives that charge per session. It bets on users keeping long-running agents alive on cheap VPS hardware rather than paying SaaS premiums.
How to use it
Update to the latest Hermes Agent release via the built-in /update command. Switch to any Claude model with /model claude-3-5-sonnet or similar. Start a new session and confirm caching works by checking reduced token counts on repeat prompts.
Access the local web dashboard with hermes web to monitor cache hits during memory reviews. The feature is live now for all self-hosted installs using Anthropic keys.
Watch for
Confirm the bet by tracking actual token savings over a full week of mixed sessions. The feature breaks if cache invalidation happens too often or if Claude rate limits ignore the saved context. Expect a follow-up move toward longer cache windows or support for additional models.
Who this matters for
- Vibe Builders: Deploy Claude-powered agents on cheap VPS hardware to maintain long-term memory at lower costs.
- Developers: Update to the latest Hermes release to automate cross-session caching via the Anthropic API.
Harsh’s take
Prompt caching is the most underrated cost-saving lever in the current LLM stack. By extending cache TTL to one hour across sessions, Hermes Agent effectively kills the penalty for frequent restarts and background memory indexing. This is a direct shot at expensive SaaS wrappers that hide these margins from users.
Operators should move their long-running Claude workflows to self-hosted Hermes instances immediately. The ability to maintain state without paying the full token tax on every interaction makes complex, memory-heavy agents viable for production. If you are not caching your system prompts and tool definitions, you are burning money for no reason.
by Harsh Desai
About Hermes Agent
View the full Hermes Agent page →All Hermes Agent updatesGo deeper
More from Hermes Agent
- FeatureHermes Agent verifies work with completion contracts and evidence ledgers
Hermes Agent records verification evidence for coding tasks. The /goal command uses completion contracts to judge success against test runs rather than model assertions.