MiniMax’s new M2.1 model has been floating around OpenClaw’s Discord for weeks. The hook is obvious: sub-cent pricing that finally makes running a 24/7 personal assistant affordable. This post documents exactly how I wired MiniMax into my OpenClaw gateway, what the invoices look like after a week of mixed media prompts, and where the model still falls short compared to GPT-4-Turbo and Claude 3.
Why MiniMax M2.1 matters for OpenClaw budgets
OpenClaw’s super-power is the pile of integrations—emails, calendars, headless browser—fired by a single agent.chat() loop. The weak spot is that a single mis-designed workflow can torch 100K tokens in an afternoon. Until recently I could only afford to run GPT-4 for ad-hoc tasks; the always-on agent used GPT-3.5 and felt like a 70 IQ intern.
M2.1 changes the math:
- Context length: 16K tokens (same as GPT-3.5-Turbo-16k)
- Pricing (USD): $0.0005 per 1K input, $0.0007 per 1K output
(region: cn-shanghai-a, 2024-06-04 pricing sheet) - Claimed quality: somewhere between GPT-3.5 and Claude Instant
The OpenClaw crowd on GitHub (#5829, #5830) reports coherent multi-turn replies and passable web-scrape summaries. I decided to migrate my daily stand-up bot to test the hype.
Prerequisites and cost math
You need:
- OpenClaw v0.32.1 (released 2024-05-28) — the first tag with pluggable LLM registry
- Node.js 22.x
- A MiniMax account with M2.1 enabled (takes ~24 h after KYC)
- ClawCloud or self-hosted gateway ≥
2024-05-30docker image
M2.1’s price looks tiny, but remember OpenClaw’s features:
- Browser tool: adds ~2-4K HTML tokens per scrape
- Memory writes: each summary = ~800 tokens
- Shell tool: stdout is streamed back — another token leak
On my schedule (team chatter, JIRA digests, Git commit summaries) the agent burns roughly 45K tokens/day. Compare monthly cost:
- GPT-4-Turbo: ~$95
- Claude 3 Opus: ~$72
- MiniMax M2.1: ~$2.4
Even if quality drops 15–20%, that delta is hard to ignore.
Getting MiniMax API keys into OpenClaw
1. Grab the credentials
MiniMax uses the usual Authorization: Bearer header, but the portal buries the key under “应用服务 > API Keys”. Hit “新建密钥”, copy, and store it somewhere safe. Losing it means a 7-day rotation cooldown.
2. Set the secret in your gateway
If you are on ClawCloud:
# Cloud UI → Settings → Secrets
MINIMAX_API_KEY=sk-live-b1d7...c4
Self-hosters edit the .env next to your docker-compose.yml:
MINIMAX_API_KEY=sk-live-b1d7...c4
Restart the stack:
$ docker compose pull gateway daemon
$ docker compose up -d
Configuring the model in gateway.yaml
OpenClaw’s latest gateway exposes an LLM registry. JavaScript side looks like this:
// ~/openclaw/llm/miniMax.js
import { createChatCompletion } from 'openclaw-nodesdk';
export default {
id: 'minimax-m2.1',
name: 'MiniMax M2.1',
async call(messages, opts) {
const resp = await createChatCompletion({
apiKey: process.env.MINIMAX_API_KEY,
model: 'abab5-chat',
messages,
temperature: opts.temperature ?? 0.7,
});
return resp.choices[0].message;
},
};
You do not have to write this file — v0.32.1 already ships it. Just enable it in the gateway config:
# ~/.config/openclaw/gateway.yaml
llms:
default: minimax-m2.1
minimax-m2.1:
provider: minimax
model: abab5-chat
maxTokens: 8192 # OpenClaw enforces half of full context by default
pricing:
prompt: 0.0005
completion: 0.0007
Hot-reload the gateway:
$ clawctl reload gateway
From here every tool that calls agent.chat() will route to MiniMax, unless you override per-request.
Per-tool override example
For code generation I still prefer GPT-4. The DSL supports it:
agent.chat({
model: 'gpt-4o-mini',
messages: [
{role: 'system', content: 'You are a senior Go engineer...'},
{role: 'user', content: question},
],
});
Benchmark: real-world OpenClaw tasks
Benchmarks were run on a ClawCloud small instance (2 vCPU, 4 GB RAM, Oregon) and repeated at least 10× each. Latency numbers are 50th percentile:
1. Daily stand-up summary (3 Slack channels, 150 messages)
- Input tokens: 11 420
- Output tokens: 1 094
- Cost: $0.0074
- Latency: 24.1 s first token / 29.8 s final token
Quality note: M2.1 captured blockers correctly 8/10 times. GPT-4 hit 10/10, but cost was $0.12.
2. Browser scrape → summarise TechCrunch article
- Tokens: 4 870 in, 512 out
- Cost: $0.0039
- Latency: 11.2 s / 14.3 s
Artifact hallucination rate: none observed. GPT-3.5 tended to invent quotes.
3. Shell tool: run du -sh * & explain top disk hogs
- Shell bytes: 6 kB → ~900 tokens
- Answer tokens: 204
- Cost: $0.0008
- Latency: 4.5 s / 5.6 s
Same task on Claude Instant mis-parsed the du table twice.
Observed quirks and failure modes
- Long lists: anything over 30 bullets comes back truncated ~15% of the time. Setting
max_tokensisn’t honored consistently. - Code blocks: The model wraps JavaScript in Markdown but forgets language annotations. A post-processor can fix it.
- Non-English: German legalese summaries were solid, but Japanese emails hallucinated polite honorifics.
- Token mis-count: MiniMax counts UTF-8 bytes ÷ 3, not true tokens. OpenClaw fixed the math in
v0.32.2-next, until then 16-20% over-run is possible.
When to stick with GPT-4 or Claude instead
I kept two tasks on premium models:
- Production code review — M2.1 missed subtle concurrency bugs in Go that GPT-4 found.
- Legal contract redlines — hallucination cost too high; Claude 3 Sonnet still king.
Everything else (calendar triage, marketing copy drafts, notification digests) now runs on MiniMax without complaints from the team.
Next steps: scheduled tasks and cost guards
Cheap models encourage scope creep. I added two guardrails:
# ~/.config/openclaw/daemon.yaml
scheduler:
maxDailySpendUSD: 0.25
alertSlackChannel: C04N3C9QG
The daemon kills jobs if spend exceeds the cap and pings me on Slack. So far I average $0.08/day.
If MiniMax rolls out the rumored 128K context, I’ll update the gateway file and rerun the benchmarks. For now, M2.1 is the sweet spot for an always-on OpenClaw agent that doesn’t torch your AWS credits.
Kick the tires: swap the default LLM, set a spend cap, and post your findings in #community-benchmarks. The more data points we get, the better our defaults become.