Skip to content
Enable cross-session Claude prompt caching | My AI Guide
FeatureHermes Agentv0.14.0

Enable cross-session Claude prompt caching

By Harsh Desai
Share

TL;DR

System prompts, skills, and memory now cache for one hour across sessions when using Claude. This reduces latency and costs for new sessions and background memory reviews.

## What changed Hermes Agent added cross-session prompt caching for Claude models. System prompts, skills, and memory now stay cached for one hour across separate sessions. The update targets latency and token costs during restarts or background memory checks.

The change applies automatically when users select Claude through the /model command or OpenRouter routes. No extra configuration is required beyond existing Claude API keys.

## Why it matters Persistent memory in Hermes Agent becomes cheaper to maintain over time. Vibe Builders who run background reviews or restart chats frequently will see lower bills and faster responses.

This move strengthens Hermes Agent against hosted alternatives that charge per session. It bets on users keeping long-running agents alive on cheap VPS hardware rather than paying SaaS premiums.

## How to use it Update to the latest Hermes Agent release via the built-in /update command. Switch to any Claude model with /model claude-3-5-sonnet or similar. Start a new session and confirm caching works by checking reduced token counts on repeat prompts.

Access the local web dashboard with hermes web to monitor cache hits during memory reviews. The feature is live now for all self-hosted installs using Anthropic keys.

## Watch for Confirm the bet by tracking actual token savings over a full week of mixed sessions. The feature breaks if cache invalidation happens too often or if Claude rate limits ignore the saved context. Expect a follow-up move toward longer cache windows or support for additional models.

Harshs take

For a solo Vibe Builder running a business in 2026, this caching reduces the real cost of keeping an always-on agent alive on a $5 VPS. The one-hour window forces you to design workflows that restart within that limit or accept occasional full reloads.

The honest trade-off is added complexity in session planning versus the simplicity of a hosted agent that handles state for you. You still pay for every token outside the cache window and must maintain the server yourself.

Do this now: run a seven-day test with your heaviest memory review task and log exact token usage before and after the update.

by Harsh Desai

Source:myaiguide.co

About Hermes Agent

View the full Hermes Agent page →All Hermes Agent updates

More from Hermes Agent

  • Feature
    Integrate LSP semantic diagnostics for file edits

    The agent now runs a language server against edited files to catch type errors and undefined symbols immediately. This provides deeper analysis than basic linting for `write_file` and `patch` operations.

  • App Update
    Launch native Windows support in early beta

    Hermes now runs natively on Windows via cmd.exe and PowerShell without requiring WSL. Includes a dedicated PowerShell installer and fixes for path normalization and process management.

  • Integration
    Add native support for LINE and SimpleX Chat

    Hermes expands its messaging reach to 22 platforms with the addition of LINE and the privacy-focused SimpleX Chat. Both are implemented as first-class messaging adapters.

Everything AI. One email.
Every Monday.

New tools. Model launches. Plugins. Repos. Tactics. The moves the sharpest builders are making right now, before everyone else.

No spam. Unsubscribe anytime.