Skip to content

Hermes Agent v0.4.0 adds OpenAI-compatible API, 6 messaging platforms, and @file context

By Harsh Desai

TL;DR

Hermes Agent v0.4.0 (The Platform Expansion Release) shipped March 24, 2026. It exposes Hermes as a drop-in OpenAI-compatible /v1/chat/completions endpoint, adds 6 new messaging adapters (Signal, DingTalk, SMS via Twilio, Mattermost, Matrix, Webhook) bringing the count to 9, introduces @file and @url context injection, adds 4 new inference providers, and enables gateway prompt caching by default.

What changed

What shipped

Hermes Agent v0.4.0 (The Platform Expansion Release) on March 24, 2026. This is the release where Hermes stops being "an AI agent" and becomes "a platform."

OpenAI-compatible API server

Hermes now exposes itself as a drop-in OpenAI-compatible /v1/chat/completions endpoint. Any tool that speaks OpenAI can point at Hermes instead. A companion /api/jobs REST API covers cron job management. If you have existing code that calls OpenAI, you can swap the base URL and point at your Hermes instance.

6 new messaging adapters

v0.4.0 brings the messaging channel count to 9:

  1. Signal
  2. DingTalk
  3. SMS via Twilio
  4. Mattermost
  5. Matrix
  6. Webhook (generic HTTP)

Combined with pre-existing Telegram, Discord, and WhatsApp, Hermes now reaches almost every significant chat ecosystem.

@ context references

Claude Code-style @file and @url context injection with CLI tab completion. Type @ and hit tab to reference a file in your project or a URL; Hermes injects the content into the prompt automatically. The syntax mirrors Claude Code's so the muscle memory transfers.

4 new inference providers

  • GitHub Copilot via OAuth (use your Copilot subscription as a provider).
  • Alibaba Cloud / DashScope for access to Qwen and other Alibaba models.
  • Kilo Code for the Kilo agent family.
  • OpenCode Zen/Go for OpenCode ecosystem models.

Gateway prompt caching

Per-session AIAgent cache preserves Anthropic prompt cache across turns. For long conversations (debugging sessions, multi-turn research), this is a meaningful cost reduction. Streaming enabled by default; 200+ bug fixes across the release.

Availability

Standard upgrade path. OpenAI-compatible API server is opt-in: hermes api serve starts it on a configurable port. Existing direct API usage continues to work unchanged.

Who this matters for

  • Vibe Builder: @file and @url context references work the same way as Claude Code, so muscle memory transfers. Signal, DingTalk, Matrix all reachable.
  • Basic User: Hermes now reaches you on 9 messaging platforms. Pick the one you already use; no new app required.
  • Developer: OpenAI-compatible /v1/chat/completions endpoint makes Hermes a drop-in replacement for any OpenAI call. GitHub Copilot OAuth provider, Alibaba DashScope, Kilo Code, and OpenCode Zen added.

What to watch next

v0.4.0 is the release where Hermes's strategic positioning becomes obvious. Exposing an OpenAI-compatible API endpoint is not just a convenience feature. It means every tool, library, SDK, and agent framework that speaks OpenAI can now use Hermes as its backend. Your Hermes install becomes a local AI platform that every existing OpenAI-compatible tool can adopt.

The @file and @url context pattern borrowed from Claude Code is a smart ecosystem move. Muscle memory from Claude Code transfers to Hermes automatically. For anyone who has spent time in Claude Code, Hermes feels instantly familiar for context injection.

Gateway prompt caching preserving Anthropic cache across turns is the economics win that matters most for long conversations. Debugging sessions with Claude that touch the same codebase across 50 turns used to pay the full prompt cost every time. Now the cache persists and you pay for the incremental context, not the full base.

9 messaging channels is the coverage point where Hermes stops missing any significant ecosystem. Signal, DingTalk, SMS, Mattermost, Matrix: combined with the pre-existing Telegram, Discord, WhatsApp, Teams, there is almost no user who cannot reach Hermes on the chat tool they already use. That reach is what makes "your AI lives wherever you are" a real claim rather than marketing.

The GitHub Copilot OAuth provider is the detail worth calling out. Treating Copilot subscription as an inference provider lets you route agent work through credits you are already paying for. For developers already on Copilot Pro, this is free additional model access.

by Harsh Desai

Source:github.com

About hermes-agent

View the full hermes-agent page →All hermes-agent updates

More from hermes-agent

  • Feature
    Hermes Agent v0.5.0 expands to 400+ models and adds supply-chain hardening

    Hermes Agent v0.5.0 (The Hardening Release) shipped March 28, 2026. Nous Portal now routes to 400+ models through one endpoint. Hugging Face becomes a first-class inference provider. Telegram gets private chat topics with per-topic skill binding. Plugin lifecycle hooks activate. Supply-chain hardening removes a compromised dep, pins version ranges, and adds CI scanning.

  • Feature
    Hermes Agent v0.6.0 lands profiles, MCP server mode, and official Docker container

    Hermes Agent v0.6.0 (The Multi-Instance Release) shipped March 30, 2026. Profiles let you run multiple isolated Hermes instances from one install. MCP Server Mode exposes the agent's conversations and sessions to Claude Desktop, Cursor, VS Code, and any MCP client. Official Docker container, two new messaging platforms (Feishu/Lark, WeCom), and an ordered fallback provider chain round out the release.

  • Feature
    Hermes Agent v0.7.0 adds pluggable memory, credential pools, and Camofox stealth browser

    Hermes Agent v0.7.0 (The Resilience Release) dropped April 3, 2026. It opens memory as an extensible plugin system (Honcho as the reference implementation), adds same-provider credential pools with automatic 401 rotation, integrates the Camofox anti-detection browser, and exposes ACP so VS Code, Zed, and JetBrains can register their MCP servers through the agent.