How to Reduce OpenClaw API Costs Without Losing Quality

If you run OpenClaw long enough, the model bill will eventually bite you. Engineers keep telling us in GitHub issues: “quality is great, but my card is screaming”. This post is a pragmatic playbook for how to reduce OpenClaw API costs without losing quality. Every tactic below is in production on my own ClawCloud tenant, serving four Slack workspaces and two WhatsApp bots. Numbers are real, configs are copy-paste-ready.

1. Route Sonnet for Routine, Opus for the Hard Stuff

Anthropic’s claude-3-opus is the default in the gateway because it’s the best. It’s also 15× pricier than claude-3-sonnet for input tokens. Over a month of normal usage (3M prompt / 1M completion tokens) the difference is almost USD $450.

You don’t need Opus to “Turn this Jira ticket into a Haiku”. You do need it for 30-page contract reviews. The fix is conditional routing.

Gateway model router (Node ≥ 22)

{
  "models": {
    "default": "claude-3-sonnet:20240229",
    "high_quality": "claude-3-opus:20240229"
  },
  "routing": [
    {
      "match": {
        "prompt_tokens": ">= 2000"  // long docs
      },
      "use": "high_quality"
    },
    {
      "match": {
        "tool": "code_review"  // my custom skill
      },
      "use": "high_quality"
    }
  ]
}

Now every prompt shorter than 2 k tokens and not tagged code_review goes to Sonnet. Quality stayed the same for chatty conversations; monthly spend dropped from $612 → $152 (‐75%).

2. Prompt Diet: Trim Skill Injection and Context Bloat

OpenClaw loves to help and therefore injects available “skills” (tool schemas) into every call. With 20 skills you easily add 2-4 k tokens that the model never reads. Disable the ones you don’t need per channel.

Per-channel skill allowlist

# .claw/skills.yaml
slack-product:
  allowed:
    - translate
    - schedule_meeting
    - ticket_lookup
whatsapp-family:
  allowed:
    - talk_like_pirate  # seriously
    - home_automation

After pruning from 26 skills to 4 for my “slack-product” channel, average prompt size fell by 1 534 tokens. Sonnet input cost went from $0.54 → $0.09 per 100 messages.

Also cap the number of previous messages you replay:

// gateway.conf.js
export default {
  contextWindow: {
    tokens: 3000,   // hard max per request
    messages: 25    // or last 25 messages, whichever hits earlier
  }
}

I’ve yet to notice quality degradation below 3 k tokens. Try smaller if your chats are short-lived.

3. Session Hygiene: Automatic Pruning

Some users keep a single DM thread alive for weeks. The daemon happily re-sends the entire history each time. Turn on session pruning so older chunks are summarised.

Enabling summarisation pruning

// daemon.config.mjs
export const pruning = {
  enabled: true,
  policy: {
    maxMessages: 40,           // keep raw
    strategy: "summarise",    // summarise >40
    summaryModel: "claude-3-haiku:20240315"
  }
}

My busiest Slack agent dropped from 11 GB to 2.3 GB of monthly prompt traffic with no user complaints.

4. Set Token Budgets and Hard Caps

You cannot optimise what you can’t limit. OpenClaw lets you set both global and per-agent budgets.

Example: USD $50 monthly per agent

# budgets.yaml
agents:
  product-slack:
    monthlyUSD: 50
  family-whatsapp:
    monthlyUSD: 15

When the cap is hit, the gateway returns a friendly “Budget exceeded” message instead of burning money. Add yourself to overage_notify to get a DM.

For extra safety, configure Claude native max_tokens on completion:

completion: {
  max_tokens: 1024,  // don’t let users stream Infinite Jokes™
  temperature: 0.7
}

5. Cron Jobs: The Silent Wallet Leak

Scheduled tasks feel free because they run in the background. Ten innocent-looking cron.yaml entries can outspend active users.

Check existing jobs

$ claw cron list
┌───────────┬───────────────┬──────────────┐
│ Schedule  │ Command       │ Est. tokens  │
├───────────┼───────────────┼──────────────┤
│ */15 * * * * │ weather_now  │ 60k/mo       │
│ 0 9 * * *     │ digest_news  │ 180k/mo      │
└───────────┴───────────────┴──────────────┘

Do you really need a 15-minute weather ping? Probably not. I merged both into a single morning digest.

After cleanup

$ claw cron list
┌────────────┬────────────┬────────────┐
│ Schedule   │ Command    │ Est.tokens │
├────────────┼────────────┼────────────┤
│ 0 8 * * *  │ daily_brief│ 55k/mo     │
└────────────┴────────────┴────────────┘

One line change, 73 % lower token burn.

6. Monitor Usage with the `/status` Command

You can’t fix what you don’t watch. The gateway ships with a built-in /status slash command for Slack/Discord, or REST at /v1/status.

$ curl https://agent.crawl.dev/v1/status | jq
{
  "period": "2024-04-01 → 2024-04-30",
  "tokens": {
    "prompt": 2_384_553,
    "completion": 781_440
  },
  "usd_estimate": 127.66,
  "model_breakdown": {
    "claude-3-sonnet": 78.5,
    "claude-3-opus": 49.1
  }
}

I alias this to claw $ENV and run it in my tmux status bar. Pair it with Prometheus if you like graphs.

7. Real-World Before/After: 78 % Savings

Here’s my honest dashboard from last month versus the current month after all tweaks.

Last month (default settings)

Prompt tokens: 4 912 228
Completion tokens: 1 702 911
Model split: 90 % Opus, 10 % Sonnet
Total cost: USD $812.37

This month (optimised)

Prompt tokens: 1 889 664
Completion tokens: 720 310
Model split: 26 % Opus, 74 % Sonnet
Total cost: USD $176.14

The only user-visible change: “weather” now arrives once a day instead of every 15 minutes. Engineering time invested: about three hours.

Next Step: Audit Your Agent Today

SSH into your gateway, run curl /v1/status, and log where the tokens go. Tweak one knob at a time—model routing is the quickest win—and watch the bill shrink without any dip in answer quality. Your finance team will thank you, and you’ll sleep better knowing Opus only wakes up when it’s truly needed.