I spent the last month running a single OpenClaw agent on ClawCloud and tracked every cent the API metered. If you are searching for an OpenClaw API cost breakdown based on real-world usage, this is the raw data. No sponsored fluff, just the bill, the bumps, and the mistakes I will not repeat.
Baseline: what exactly did I put on the meter?
I spun up one agent named ops-raccoon on day one of the billing cycle. Hardware tier was the default compute-small (4 vCPU, 16 GB RAM) with the 4M token monthly bundle that ships with ClawCloud Standard (USD 29). Anything over 4M tokens is overage at $0.000004 / token. Versions at start:
- OpenClaw daemon 0.18.3
- Node.js 22.2.0
- Gateway 0.19.1 (web UI)
Integrations enabled:
- Gmail (via Composio) — triage + send
- GitHub — PR comment drafts
- Google Calendar — scheduling
- Bing Search API — research mode
- Local shell access (read-only)
I ran everything through the hosted gateway; no self-hosted model. Under the hood ClawCloud currently proxies to GPT-4 Turbo 128k for heavy tasks and GPT-3.5 Turbo 16k for “lite” tasks. You can choose, but I left the defaults.
30-day topline numbers
- Total prompts sent: 7,046
- Total tokens consumed: 6.92 M
- Bundle coverage: 4.00 M (included)
- Overage tokens: 2.92 M
- Base subscription: $29.00
- Overage cost: 2.92 M × 0.000004 = $11.68
- Total API bill: $40.68
That’s the invoice that hit my card. Now the interesting part: where did those 6.92 million tokens actually go?
Cost breakdown by task type
Email management
- Prompts: 2,134 (30.3 %)
- Tokens: 1.02 M (14.7 %)
- Cost: $6.08
I wrote a small Gmail triage workflow: every 15 minutes OpenClaw fetched unread threads, summarized each, proposed replies, then waited for my thumbs-up in Telegram. Summaries are cheap (model: gpt-3.5-turbo), but the steady drip of scheduled jobs ran 96 times per day, which piled up.
Coding assistant
- Prompts: 1,487 (21.1 %)
- Tokens: 2.44 M (35.3 %)
- Cost: $9.76
Coding sessions were interactive and used GPT-4 Turbo. I linked the agent to my repo so it could open PRs and comment. Large context windows (entire diff + file) is what burned tokens. A single review averaged 11 k tokens.
Research mode
- Prompts: 925 (13.1 %)
- Tokens: 2.64 M (38.1 %)
- Cost: $10.56
Research is where I blew the bundle. I asked the agent to write a market landscape for engineering time-tracking tools. Each query streamed search snippets, visited 5–10 pages via headless Chromium, scraped text, then composed an analysis. I did 43 runs of that. GPT-4 Turbo swallowed the full scrape each time. Ouch.
Scheduling & reminders
- Prompts: 2,500 (35.5 %)
- Tokens: 0.82 M (11.9 %)
- Cost: $3.28
Calendar updates are low-token but high frequency. I let OpenClaw watch my Slack for /meet messages and propose meeting slots. Works great, doesn’t break the bank.
Daily averages & the three big spikes
Quick math: 6.92 M tokens over 30 days is 230.7 k tokens/day. But averages hide pain. Here are days I noticed in the graph exported from clawctl usage --csv:
- Day 6 – 646 k tokens – first long research run. I let the agent crawl 17 URLs. Lesson: throttle depth.
- Day 12 – 911 k tokens – code review of a 5,200-line diff across 34 files. I forgot to strip vendor JS. 💸
- Day 23 – 708 k tokens – email backlog after a weekend offline. Summaries fine, suggested replies long.
The other 27 days stayed in the 140–260 k token range.
What actually drove cost up
- Context, not prompt count. A single 15-message chat thread can be cheaper than one giant prompt + context if the latter drags 60 k tokens of scraped text.
- Automatic retries. The gateway retries on 5xx by default. When Bing throttles, you pay for the retry unless you set
retry=false. - Verbose tool response. Browser control returns full DOM by default. I trimmed to
innerTextonly and saved ~20 % immediately.
Metrics tooling I used
ClawCloud’s dashboard is fine, but I wanted per-feature insight. Quick bash hack:
# aggregate by label
awk -F"," '{tokens[$3]+=$5} END {for (l in tokens) print l, tokens[l]}' usage.csv | sort -k2 -nr
Here $3 is the label I attach in my prompt wrapper ([email], [code], etc.), $5 is tokens. For a nicer view I piped into gnuplot.
To catch spikes before the invoice date I scheduled:
0 * * * * clawctl usage --last 1h --format json \
| jq '.total_tokens' \
| xargs -I{} curl -X POST https://hooks.slack.com/... -d "tokens:{}"
Anything over 400 k tokens/hour notifies me in #ops.
Optimization moves after month one
- Switched research mode to gpt-3.5 for scraping. Use GPT-4 only for synthesis.
- Chunk code review. Max 1,000 lines per call, and deduplicate unchanged context.
- Short-lived memory for email replies. Don’t persist the entire thread.
- Disabled auto-retries. Added exponential backoff in workflow instead.
- Added
--dry-runflag to shell tool. Token-cheap preview so I can sanity-check the plan.
Projected token usage for month two is 3.1 M, comfortably inside the bundle.
Self-host vs. ClawCloud: quick math
I benchmarked running the same agent against a local Mistral 7B quantized in llama.cpp on a spare 4090 box.
- Hardware amortized cost: ~$1.30/day (electricity + depreciation)
- Inference time: 3–4× slower per token
- Context window: 32 k max, required aggressive clipping
- Total month cost: ~$39, similar to the hosted GPT mix, but slower and more tinkering
For me the hosted tier wins until I consistently exceed 15 M tokens/month or need on-prem for privacy.
Where I’m taking it next
Week 5 the token graph looks boring — exactly what I want. The agent handles inbox zero and posts PR nits while I sleep. Next I plan to plug in the Slack RAG connector so it can cite internal doc answers. I’ll report back if the bill balloons.
If you are about to turn on OpenClaw at work, tag your flows from day one, watch context size, and set a Slack alert on the token counter. Your finance team will stay calm and you’ll stay out of CSV hell.