If you are searching for "OpenClaw token usage tracking and cost monitoring setup" you are probably already running one or more agents in production and noticed the monthly bill climbing. This guide walks through every layer: the built-in /status command, server-side metrics, budget alerts, and Grafana dashboards. Nothing here is theoretical—these are the exact steps we use on a 20-agent install that processes ~28 M tokens per day.
Why bother? A short cost post-mortem
OpenClaw makes it dangerously easy to add a new skill (plugin) or schedule a cron job. Two lines of YAML and your agent scrapes Hacker News every minute. But every LLM call costs tokens, and tokens cost money. The first time we looked we discovered:
- A Telegram support bot that burned 43 % of our monthly budget by summarizing every sticker as if it were Shakespeare.
- A forgotten weekly cron that re-trained an embedding index—11 M tokens per run.
- Nested tool calls causing quadratic prompt growth (the classic “context blow-up” problem).
Bottom line: you need hard data—session-level numbers, historical trends, and fast signals when something misbehaves.
Quick tour of the /status command
The gateway ships with /status since openclaw@3.6.0. It gives real-time counters for the current session (one WebSocket connection or one DM channel). Run it from any channel the agent is in:
> /status
Tokens used: 3 142
Tools invoked: 17
Runtime: 1 h 12 m
Approx cost: $0.0146 (model=gpt-3.5-turbo-0125)
Handy for debugging a single chat, but it resets when the session dies and tells you nothing about cron jobs or other users. We need more.
Enabling server-side token metrics
OpenClaw exposes a metrics endpoint guarded by the daemon. Under the hood it uses prom-client (Prometheus). Starting with openclaw@3.7.2 the following metrics are emitted:
openclaw_tokens_total{model="gpt-4o"}openclaw_tokens_prompt_totalopenclaw_tokens_completion_totalopenclaw_tools_invoked_total{tool="github.issue.create"}openclaw_cron_runs_total{name="daily_digest"}openclaw_request_duration_seconds_bucket
Enable the endpoint in gateway.config.mjs:
export default {
metrics: {
enabled: true,
port: 9464, // default Prometheus scrape port
authToken: process.env.METRICS_TOKEN // optional
}
}
Restart the daemon:
$ npm run gateway
Hit http://localhost:9464/metrics. You should see plain-text counters. If you get a 404 you are on an older gateway—upgrade.
Shipping metrics to Prometheus
Add a scrape job:
- job_name: 'openclaw'
metrics_path: /metrics
scheme: http
static_configs:
- targets: ['openclaw-gateway:9464']
Reload Prometheus (SIGHUP or /-/reload) and confirm with openclaw_tokens_total in the expression browser.
Setting up cost alerts that actually wake you up
We push two alert layers: soft (chat) and hard (budget kill-switch).
Alertmanager rules
- alert: OpenClawTokenBurn
expr: rate(openclaw_tokens_total[1h]) > 300000 # ~90k prompt + 210k completion
for: 10m
labels:
severity: warning
annotations:
summary: "OpenClaw token rate > 300k per hour"
description: "Investigate first. This cost ~ $4.50/h on gpt-4o."
Route it to Slack or Telegram. A 10-minute burn is usually a bug; humans can fix fast.
Budget popper script
Soft alerts are great until it’s 3 AM. We wrote a guard that calls the ClawCloud billing API and disables the gateway if the daily quota is reached.
#!/usr/bin/env node
import fetch from 'node-fetch';
const limit = 5000; // cents, daily budget $50
const key = process.env.CLAWCLOUD_BILLING_KEY;
const resp = await fetch('https://api.claw.cloud/billing/v1/usage', {
headers: {Authorization: `Bearer ${key}`}
});
const {today} = await resp.json();
if (today.cents > limit) {
console.log('🔥 Budget exceeded, shutting down gateway');
process.exit(1); // supervisord will keep it off until midnight
}
Run it every 15 min via cron or as another OpenClaw scheduledTask. Yes, crashing the gateway is brutal but cheaper than a runaway prompt.
Digging into token consumption patterns
Now that data is flowing, patterns emerge. We built three Grafana panels that answer 90 % of questions.
Panel 1 – Tokens by model per hour
sum by (model)(increase(openclaw_tokens_total[1h]))
Helps spot a sudden shift from gpt-3.5 to gpt-4o—usually someone requesting higher quality without telling you.
Panel 2 – Top 5 skills
topk(5, sum by (tool)(increase(openclaw_tools_invoked_total[6h])))
If notion.page.write jumps to the top you know marketing scheduled another mass memory sync.
Panel 3 – Cron job cost heatmap
heatmap
sum by (name)(increase(openclaw_cron_runs_total[1d]))
Visualizes jobs vs. day. A diagonal line indicates you keep pushing the schedule later, eventually overlapping other work.
Identify and fix expensive skills or runaway cron jobs
Cheap fixes first:
- Streaming completions. Add
stream: trueat the skill level. You typically save 5-10 % tokens because you truncate early once the user has what they need. - Prompt compression. Switch the default prompt formatter to
v2-compact(gateway.config.mjs). - Model downgrade on retries. Many skills default to
gpt-4o. Wrap calls inretryWithFallback(['gpt-4o','gpt-3.5-turbo-0125']).
For cron jobs that cannot be trimmed (e.g., a daily embeddings refresh) move them to off-peak GPU nodes or batch them behind a single vector.upsert instead of N spam calls.
Forecasting and visualizing spend over time
Prometheus is great for real-time but query windows >30 days get slow. Two options:
Option 1 – Exporter to BigQuery + Metabase
We stream the metrics to BigQuery via prometheus-bq-exporter@0.2.4. Table schema ends up:
timestampmetric_namelabels(RECORD)value
Metabase then runs:
SELECT
DATE_TRUNC(timestamp, WEEK) AS week,
SUM(CASE
WHEN metric_name = 'openclaw_tokens_prompt_total' THEN value * 0.0005
WHEN metric_name = 'openclaw_tokens_completion_total' THEN value * 0.0015
END) AS cost_usd
FROM `billing.openclaw_metrics`
WHERE timestamp BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 12 WEEK) AND CURRENT_DATE()
GROUP BY week
ORDER BY week;
Draw a forecast line with Metabase’s built-in Trend feature—good enough for a quarterly budget.
Option 2 – Native ClawCloud cost API
If you are on ClawCloud (not self-hosted) you get daily and hourly spend via:
GET https://api.claw.cloud/billing/v1/usage?range=90d
Authorization: Bearer <key>
The response already aggregates by model and includes list prices, so you can skip the math. We still forward it to Grafana so everything lives in one place.
Operating tips & next steps
- Upgrade gateway and daemon together; mismatched versions silently drop metrics.
- Pin your OpenAI model versions (
gpt-3.5-turbo-0125vs. plaingpt-3.5-turbo) so you don’t get surprise price bumps. - Store prompts in Git. 70 % of cost bugs start with “small tweak” messages in Slack.
- Run
/statusin any chat when users complain about slowness; high token count usually correlates with multi-step reasoning. - Set
OPENCLAW_MAX_TOKENS=4096globally unless you truly need 128 k context. The hard cap has saved us multiple times. - Consider a Model Router layer (we use
llm-router@1.4.0) that selectsgpt-3.5/gpt-4obased on prompt complexity—cuts average cost 37 %. - Automate a weekly “cost diff” PR that comments on the biggest changes. Engineers read code reviews more than dashboards.
Pick at least one of the visualization paths today. Even a crude Grafana panel beats the monthly credit-card panic that hits when you run blind.