MiniMax’s new M2.1 model has been floating around OpenClaw’s Discord for weeks. The hook is obvious: sub-cent pricing that finally makes running a 24/7 personal assistant affordable. This post documents exactly how I wired MiniMax into my OpenClaw gateway, what the invoices look like after a week of mixed media prompts, and where the model still falls short compared to GPT-4-Turbo and Claude 3.

Why MiniMax M2.1 matters for OpenClaw budgets

OpenClaw’s super-power is the pile of integrations—emails, calendars, headless browser—fired by a single agent.chat() loop. The weak spot is that a single mis-designed workflow can torch 100K tokens in an afternoon. Until recently I could only afford to run GPT-4 for ad-hoc tasks; the always-on agent used GPT-3.5 and felt like a 70 IQ intern.

M2.1 changes the math:

  • Context length: 16K tokens (same as GPT-3.5-Turbo-16k)
  • Pricing (USD): $0.0005 per 1K input, $0.0007 per 1K output
    (region: cn-shanghai-a, 2024-06-04 pricing sheet)
  • Claimed quality: somewhere between GPT-3.5 and Claude Instant

The OpenClaw crowd on GitHub (#5829, #5830) reports coherent multi-turn replies and passable web-scrape summaries. I decided to migrate my daily stand-up bot to test the hype.

Prerequisites and cost math

You need:

  • OpenClaw v0.32.1 (released 2024-05-28) — the first tag with pluggable LLM registry
  • Node.js 22.x
  • A MiniMax account with M2.1 enabled (takes ~24 h after KYC)
  • ClawCloud or self-hosted gateway ≥ 2024-05-30 docker image

M2.1’s price looks tiny, but remember OpenClaw’s features:

  • Browser tool: adds ~2-4K HTML tokens per scrape
  • Memory writes: each summary = ~800 tokens
  • Shell tool: stdout is streamed back — another token leak

On my schedule (team chatter, JIRA digests, Git commit summaries) the agent burns roughly 45K tokens/day. Compare monthly cost:

  • GPT-4-Turbo: ~$95
  • Claude 3 Opus: ~$72
  • MiniMax M2.1: ~$2.4

Even if quality drops 15–20%, that delta is hard to ignore.

Getting MiniMax API keys into OpenClaw

1. Grab the credentials

MiniMax uses the usual Authorization: Bearer header, but the portal buries the key under “应用服务 > API Keys”. Hit “新建密钥”, copy, and store it somewhere safe. Losing it means a 7-day rotation cooldown.

2. Set the secret in your gateway

If you are on ClawCloud:

# Cloud UI → Settings → Secrets MINIMAX_API_KEY=sk-live-b1d7...c4

Self-hosters edit the .env next to your docker-compose.yml:

MINIMAX_API_KEY=sk-live-b1d7...c4

Restart the stack:

$ docker compose pull gateway daemon $ docker compose up -d

Configuring the model in gateway.yaml

OpenClaw’s latest gateway exposes an LLM registry. JavaScript side looks like this:

// ~/openclaw/llm/miniMax.js import { createChatCompletion } from 'openclaw-nodesdk'; export default { id: 'minimax-m2.1', name: 'MiniMax M2.1', async call(messages, opts) { const resp = await createChatCompletion({ apiKey: process.env.MINIMAX_API_KEY, model: 'abab5-chat', messages, temperature: opts.temperature ?? 0.7, }); return resp.choices[0].message; }, };

You do not have to write this file — v0.32.1 already ships it. Just enable it in the gateway config:

# ~/.config/openclaw/gateway.yaml llms: default: minimax-m2.1 minimax-m2.1: provider: minimax model: abab5-chat maxTokens: 8192 # OpenClaw enforces half of full context by default pricing: prompt: 0.0005 completion: 0.0007

Hot-reload the gateway:

$ clawctl reload gateway

From here every tool that calls agent.chat() will route to MiniMax, unless you override per-request.

Per-tool override example

For code generation I still prefer GPT-4. The DSL supports it:

agent.chat({ model: 'gpt-4o-mini', messages: [ {role: 'system', content: 'You are a senior Go engineer...'}, {role: 'user', content: question}, ], });

Benchmark: real-world OpenClaw tasks

Benchmarks were run on a ClawCloud small instance (2 vCPU, 4 GB RAM, Oregon) and repeated at least 10× each. Latency numbers are 50th percentile:

1. Daily stand-up summary (3 Slack channels, 150 messages)

  • Input tokens: 11 420
  • Output tokens: 1 094
  • Cost: $0.0074
  • Latency: 24.1 s first token / 29.8 s final token

Quality note: M2.1 captured blockers correctly 8/10 times. GPT-4 hit 10/10, but cost was $0.12.

2. Browser scrape → summarise TechCrunch article

  • Tokens: 4 870 in, 512 out
  • Cost: $0.0039
  • Latency: 11.2 s / 14.3 s

Artifact hallucination rate: none observed. GPT-3.5 tended to invent quotes.

3. Shell tool: run du -sh * & explain top disk hogs

  • Shell bytes: 6 kB → ~900 tokens
  • Answer tokens: 204
  • Cost: $0.0008
  • Latency: 4.5 s / 5.6 s

Same task on Claude Instant mis-parsed the du table twice.

Observed quirks and failure modes

  • Long lists: anything over 30 bullets comes back truncated ~15% of the time. Setting max_tokens isn’t honored consistently.
  • Code blocks: The model wraps JavaScript in Markdown but forgets language annotations. A post-processor can fix it.
  • Non-English: German legalese summaries were solid, but Japanese emails hallucinated polite honorifics.
  • Token mis-count: MiniMax counts UTF-8 bytes ÷ 3, not true tokens. OpenClaw fixed the math in v0.32.2-next, until then 16-20% over-run is possible.

When to stick with GPT-4 or Claude instead

I kept two tasks on premium models:

  1. Production code review — M2.1 missed subtle concurrency bugs in Go that GPT-4 found.
  2. Legal contract redlines — hallucination cost too high; Claude 3 Sonnet still king.

Everything else (calendar triage, marketing copy drafts, notification digests) now runs on MiniMax without complaints from the team.

Next steps: scheduled tasks and cost guards

Cheap models encourage scope creep. I added two guardrails:

# ~/.config/openclaw/daemon.yaml scheduler: maxDailySpendUSD: 0.25 alertSlackChannel: C04N3C9QG

The daemon kills jobs if spend exceeds the cap and pings me on Slack. So far I average $0.08/day.

If MiniMax rolls out the rumored 128K context, I’ll update the gateway file and rerun the benchmarks. For now, M2.1 is the sweet spot for an always-on OpenClaw agent that doesn’t torch your AWS credits.

Kick the tires: swap the default LLM, set a spend cap, and post your findings in #community-benchmarks. The more data points we get, the better our defaults become.