OpenClaw Architecture Explained: Gateway and Agent Runtime Internals

The question I keep getting after the 145 k-star announcement is the same one that shows up in Google autocomplete: “OpenClaw architecture explained – how the gateway and agent runtime work.” I finally sat down, pulled the latest v0.27.5 tag, and traced the call stack. This write-up is the result. It is not marketing copy; it is what you hit once you clone the repo and start grepping.

Why bother cracking open the hood?

If you only run the hosted version on ClawCloud, the architecture is mostly invisible. You sign up, name an agent, and a few seconds later the bot is chatting on Slack. But the minute you run npm i -g openclaw@0.27.5 && clawd init on your own hardware, you meet the gateway, the local daemon, a WebSocket avalanche, and a handful of worker threads. Paolo Perazzo’s architectural breakdown on GitHub is still the canonical diagram, but it stops at the 10,000-ft view. Below is the “I set a breakpoint in src/runtime/loop.ts” version.

Layer cake overview: communication, reasoning, memory, execution

OpenClaw is consciously boring: four horizontal layers and a thin vertical slice for observability. No service mesh, no mystical black box. Each layer is a separate TypeScript package published under the monorepo.

Communication: adapters for WhatsApp, Telegram, Discord, Slack, Signal, iMessage, Web. All normalize inbound messages into a NormalizedMessage shape.
Reasoning: the actual agent logic. Think of it as a pluggable thought loop. Default implementation is the classic ReAct chain (prompt → LLM → tool call → observation → next prompt).
Memory: vector store + key/value + time-series. Configurable back ends: SQLite (default), Postgres, DuckDB, Pinecone, or --inmemory flag for speed tests.
Execution: where tool calls, shell commands, browser automation, and scheduled tasks happen. Relies on Composio’s 800+ integrations plus first-party browser/shell drivers.

The gateway sits on top as the control plane. The agent runtime lives mostly in the reasoning layer but calls down into execution and memory every cycle.

The Gateway: central control plane, session table, routing

HTTP, WebSocket, and gRPC in one file

The gateway enters via src/gateway/index.ts. The file is 380 LOC and sets up:

Express server on localhost:4237 (override with --port)
ws WebSocket endpoint at /ws
Optional gRPC server when CLAW_GRPC=1

Why three protocols? Historical reasons. WhatsApp and Signal bridges arrived first via WebSocket. Later, server-side tools wanted direct gRPC streaming, and legacy web UI still polls JSON over HTTP. For most installs, only HTTP and WebSocket are active.

Session management

The gateway keeps an in-memory Map<string, SessionCtx>. Key is sessionId (predictable uuidv7), value contains:

agentId
channel ("slack", "telegram", ...)
rolling message buffer (last 50 turns, soft-configurable)
active tool lock (prevents race conditions when multiple triggers fire)

Sessions expire after 12 hours of inactivity (configurable). On expiry the gateway flushes to the memory layer’s long-term store if persistSessions=true.

Channel normalization

Every adapter must converge on:

{
  id: string,          // message id per platform
  type: "user" | "system" | "tool",
  text: string,        // raw UTF-8, emojis preserved
  attachments: Blob[], // files if any, lazy-loaded
  meta: Record
}

The gateway never touches text besides size checks (32 kB limit by default). All further formatting lives in the reasoning layer.

Agent runtime internals: the asynchronous reasoning loop

The runtime is the part Paolo called “brain.” It is a single async function that never returns until you kill the process. The simplified pseudocode:

while (true) {
  const event = await gateway.nextEvent();
  const context = await memory.load(event.sessionId);
  const thought = await llm.prompt(context, event.message);
  const action = planner.plan(thought);
  if (action) {
    const result = await executor.run(action);
    await memory.save(result);
    await gateway.reply(sessionId, result);
  }
}

OpenClaw is on Node 22, so the runtime leans on AbortController and EventTarget. Each agent spawns its own AgentRuntime instance; by default you get a pool of eight (overridden by --agents). Runtimes are stateless between cycles; all persistence lives in Memory.

Event loop, not cron

Early versions used setInterval for scheduled tasks. That blew up when tasks overlapped and swallowed Promises. Since v0.24 the scheduler pushes synthetic "tick" events into the same queue as user messages. One loop to rule them all, which means backpressure is uniform and easy to observe.

Memory layer: vector store, key/value, snapshots

Three sub-stores, switchable independently:

Vector: default is sqlite3 with sqlite-vss extension. For prod we run Pinecone on ClawCloud. DuckDB works but misses filtered ANN queries.
KV: Lightning-fast lmdb-store. If you pass --external-redis, we swap to Redis 7 with JSON module.
Snapshot: gzip’d JSON blobs stored in ~/.claw/snapshots. One per agent per day; used for rollbacks.

The memory API is async get(), set(), search(), all Promise-based. No streams yet; Paolo hinted at moving to Async Iterables once the Node 22 LTS dust settles.

Tool orchestration and channel normalization

Composio glue

When a reasoning step emits {"tool":"gmail.send","args":{...}}, the executor serializes that to the Composio SDK:

import { exec } from '@composio/sdk';
...
const result = await exec("gmail.send", args);

The call returns a structured object. No reflection or eval. If the tool fails, we catch, log, and reply with a "tool_error" message so the LLM can recover in the next cycle.

Browser and shell drivers

For browser automation, OpenClaw embeds puppeteer@22. A headless Chromium instance is kept alive per agent (Chrome DevTools Protocol, not Playwright). Shell access goes through a jailed chroot on Linux and a hardened macOS sandbox-exec profile on macOS. No Windows support yet.

Failure modes, scaling notes, what to tweak first

Gateway overload

Because sessions are memory-resident, 32k+ concurrent sessions will pressure the Node heap. Mitigation: set --session-store=redis to offload to RedisHash. Adds 1-2 ms per lookup, acceptable.

Runtime starvation

Eight runtimes are plenty for hobby traffic. If queue depth goes above 100 events, bump --agents to number of cores × 2. Each runtime is ~120 MB RSS with Puppeteer idle; plan RAM accordingly.

LLM timeouts

Default call timeout is 45 s. On batch mode (--openai-batch=10) timeouts cascade. Tune GATEWAY_LLM_TIMEOUT_MS or supply your own llm.ts driver.

Observability

Set CLAW_TRACE=1 to emit OpenTelemetry spans over OTLP/HTTP. Jaeger UI will show gateway receive → runtime think → tool exec chain.

Practical takeaway

If you need one mental model, remember: the gateway is a dumb router plus session cache, and the runtime is an async loop that alternates LLM calls with tool execution. Everything else—memory back end, channel adapter, browser vs. Gmail—is a plug. Clone the repo, start with src/gateway, follow the session map, then step into runtime/loop.ts. You’ll have 80 % of OpenClaw mapped in an afternoon.

Once you understand that, you can safely patch in your own memory store or swap Puppeteer for Playwright without touching the core loop. The architecture is boring on purpose, and that’s why it works.