If you are searching for "OpenClaw token usage tracking and cost monitoring setup" you are probably already running one or more agents in production and noticed the monthly bill climbing. This guide walks through every layer: the built-in /status command, server-side metrics, budget alerts, and Grafana dashboards. Nothing here is theoretical—these are the exact steps we use on a 20-agent install that processes ~28 M tokens per day.

Why bother? A short cost post-mortem

OpenClaw makes it dangerously easy to add a new skill (plugin) or schedule a cron job. Two lines of YAML and your agent scrapes Hacker News every minute. But every LLM call costs tokens, and tokens cost money. The first time we looked we discovered:

  • A Telegram support bot that burned 43 % of our monthly budget by summarizing every sticker as if it were Shakespeare.
  • A forgotten weekly cron that re-trained an embedding index—11 M tokens per run.
  • Nested tool calls causing quadratic prompt growth (the classic “context blow-up” problem).

Bottom line: you need hard data—session-level numbers, historical trends, and fast signals when something misbehaves.

Quick tour of the /status command

The gateway ships with /status since openclaw@3.6.0. It gives real-time counters for the current session (one WebSocket connection or one DM channel). Run it from any channel the agent is in:

> /status Tokens used: 3 142 Tools invoked: 17 Runtime: 1 h 12 m Approx cost: $0.0146 (model=gpt-3.5-turbo-0125)

Handy for debugging a single chat, but it resets when the session dies and tells you nothing about cron jobs or other users. We need more.

Enabling server-side token metrics

OpenClaw exposes a metrics endpoint guarded by the daemon. Under the hood it uses prom-client (Prometheus). Starting with openclaw@3.7.2 the following metrics are emitted:

  • openclaw_tokens_total{model="gpt-4o"}
  • openclaw_tokens_prompt_total
  • openclaw_tokens_completion_total
  • openclaw_tools_invoked_total{tool="github.issue.create"}
  • openclaw_cron_runs_total{name="daily_digest"}
  • openclaw_request_duration_seconds_bucket

Enable the endpoint in gateway.config.mjs:

export default { metrics: { enabled: true, port: 9464, // default Prometheus scrape port authToken: process.env.METRICS_TOKEN // optional } }

Restart the daemon:

$ npm run gateway

Hit http://localhost:9464/metrics. You should see plain-text counters. If you get a 404 you are on an older gateway—upgrade.

Shipping metrics to Prometheus

Add a scrape job:

- job_name: 'openclaw' metrics_path: /metrics scheme: http static_configs: - targets: ['openclaw-gateway:9464']

Reload Prometheus (SIGHUP or /-/reload) and confirm with openclaw_tokens_total in the expression browser.

Setting up cost alerts that actually wake you up

We push two alert layers: soft (chat) and hard (budget kill-switch).

Alertmanager rules

- alert: OpenClawTokenBurn expr: rate(openclaw_tokens_total[1h]) > 300000 # ~90k prompt + 210k completion for: 10m labels: severity: warning annotations: summary: "OpenClaw token rate > 300k per hour" description: "Investigate first. This cost ~ $4.50/h on gpt-4o."

Route it to Slack or Telegram. A 10-minute burn is usually a bug; humans can fix fast.

Budget popper script

Soft alerts are great until it’s 3 AM. We wrote a guard that calls the ClawCloud billing API and disables the gateway if the daily quota is reached.

#!/usr/bin/env node import fetch from 'node-fetch'; const limit = 5000; // cents, daily budget $50 const key = process.env.CLAWCLOUD_BILLING_KEY; const resp = await fetch('https://api.claw.cloud/billing/v1/usage', { headers: {Authorization: `Bearer ${key}`} }); const {today} = await resp.json(); if (today.cents > limit) { console.log('🔥 Budget exceeded, shutting down gateway'); process.exit(1); // supervisord will keep it off until midnight }

Run it every 15 min via cron or as another OpenClaw scheduledTask. Yes, crashing the gateway is brutal but cheaper than a runaway prompt.

Digging into token consumption patterns

Now that data is flowing, patterns emerge. We built three Grafana panels that answer 90 % of questions.

Panel 1 – Tokens by model per hour

sum by (model)(increase(openclaw_tokens_total[1h]))

Helps spot a sudden shift from gpt-3.5 to gpt-4o—usually someone requesting higher quality without telling you.

Panel 2 – Top 5 skills

topk(5, sum by (tool)(increase(openclaw_tools_invoked_total[6h])))

If notion.page.write jumps to the top you know marketing scheduled another mass memory sync.

Panel 3 – Cron job cost heatmap

heatmap sum by (name)(increase(openclaw_cron_runs_total[1d]))

Visualizes jobs vs. day. A diagonal line indicates you keep pushing the schedule later, eventually overlapping other work.

Identify and fix expensive skills or runaway cron jobs

Cheap fixes first:

  • Streaming completions. Add stream: true at the skill level. You typically save 5-10 % tokens because you truncate early once the user has what they need.
  • Prompt compression. Switch the default prompt formatter to v2-compact (gateway.config.mjs).
  • Model downgrade on retries. Many skills default to gpt-4o. Wrap calls in retryWithFallback(['gpt-4o','gpt-3.5-turbo-0125']).

For cron jobs that cannot be trimmed (e.g., a daily embeddings refresh) move them to off-peak GPU nodes or batch them behind a single vector.upsert instead of N spam calls.

Forecasting and visualizing spend over time

Prometheus is great for real-time but query windows >30 days get slow. Two options:

Option 1 – Exporter to BigQuery + Metabase

We stream the metrics to BigQuery via prometheus-bq-exporter@0.2.4. Table schema ends up:

  • timestamp
  • metric_name
  • labels (RECORD)
  • value

Metabase then runs:

SELECT DATE_TRUNC(timestamp, WEEK) AS week, SUM(CASE WHEN metric_name = 'openclaw_tokens_prompt_total' THEN value * 0.0005 WHEN metric_name = 'openclaw_tokens_completion_total' THEN value * 0.0015 END) AS cost_usd FROM `billing.openclaw_metrics` WHERE timestamp BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 12 WEEK) AND CURRENT_DATE() GROUP BY week ORDER BY week;

Draw a forecast line with Metabase’s built-in Trend feature—good enough for a quarterly budget.

Option 2 – Native ClawCloud cost API

If you are on ClawCloud (not self-hosted) you get daily and hourly spend via:

GET https://api.claw.cloud/billing/v1/usage?range=90d Authorization: Bearer <key>

The response already aggregates by model and includes list prices, so you can skip the math. We still forward it to Grafana so everything lives in one place.

Operating tips & next steps

  • Upgrade gateway and daemon together; mismatched versions silently drop metrics.
  • Pin your OpenAI model versions (gpt-3.5-turbo-0125 vs. plain gpt-3.5-turbo) so you don’t get surprise price bumps.
  • Store prompts in Git. 70 % of cost bugs start with “small tweak” messages in Slack.
  • Run /status in any chat when users complain about slowness; high token count usually correlates with multi-step reasoning.
  • Set OPENCLAW_MAX_TOKENS=4096 globally unless you truly need 128 k context. The hard cap has saved us multiple times.
  • Consider a Model Router layer (we use llm-router@1.4.0) that selects gpt-3.5/gpt-4o based on prompt complexity—cuts average cost 37 %.
  • Automate a weekly “cost diff” PR that comments on the biggest changes. Engineers read code reviews more than dashboards.

Pick at least one of the visualization paths today. Even a crude Grafana panel beats the monthly credit-card panic that hits when you run blind.