How to use OpenClaw with MiniMax M2.1 for a budget AI assistant

MiniMax’s new M2.1 model has been floating around OpenClaw’s Discord for weeks. The hook is obvious: sub-cent pricing that finally makes running a 24/7 personal assistant affordable. This post documents exactly how I wired MiniMax into my OpenClaw gateway, what the invoices look like after a week of mixed media prompts, and where the model still falls short compared to GPT-4-Turbo and Claude 3.

Why MiniMax M2.1 matters for OpenClaw budgets

OpenClaw’s super-power is the pile of integrations—emails, calendars, headless browser—fired by a single agent.chat() loop. The weak spot is that a single mis-designed workflow can torch 100K tokens in an afternoon. Until recently I could only afford to run GPT-4 for ad-hoc tasks; the always-on agent used GPT-3.5 and felt like a 70 IQ intern.

M2.1 changes the math:

Context length: 16K tokens (same as GPT-3.5-Turbo-16k)
Pricing (USD): $0.0005 per 1K input, $0.0007 per 1K output
(region: cn-shanghai-a, 2024-06-04 pricing sheet)
Claimed quality: somewhere between GPT-3.5 and Claude Instant

The OpenClaw crowd on GitHub (#5829, #5830) reports coherent multi-turn replies and passable web-scrape summaries. I decided to migrate my daily stand-up bot to test the hype.

Prerequisites and cost math

You need:

OpenClaw v0.32.1 (released 2024-05-28) — the first tag with pluggable LLM registry
Node.js 22.x
A MiniMax account with M2.1 enabled (takes ~24 h after KYC)
ClawCloud or self-hosted gateway ≥ 2024-05-30 docker image

M2.1’s price looks tiny, but remember OpenClaw’s features:

Browser tool: adds ~2-4K HTML tokens per scrape
Memory writes: each summary = ~800 tokens
Shell tool: stdout is streamed back — another token leak

On my schedule (team chatter, JIRA digests, Git commit summaries) the agent burns roughly 45K tokens/day. Compare monthly cost:

GPT-4-Turbo: ~$95
Claude 3 Opus: ~$72
MiniMax M2.1: ~$2.4

Even if quality drops 15–20%, that delta is hard to ignore.

Getting MiniMax API keys into OpenClaw

1. Grab the credentials

MiniMax uses the usual Authorization: Bearer header, but the portal buries the key under “应用服务 > API Keys”. Hit “新建密钥”, copy, and store it somewhere safe. Losing it means a 7-day rotation cooldown.

2. Set the secret in your gateway

If you are on ClawCloud:


# Cloud UI → Settings → Secrets
MINIMAX_API_KEY=sk-live-b1d7...c4

Self-hosters edit the .env next to your docker-compose.yml:


MINIMAX_API_KEY=sk-live-b1d7...c4

Restart the stack:


$ docker compose pull gateway daemon
$ docker compose up -d

Configuring the model in `gateway.yaml`

OpenClaw’s latest gateway exposes an LLM registry. JavaScript side looks like this:


// ~/openclaw/llm/miniMax.js
import { createChatCompletion } from 'openclaw-nodesdk';
export default {
  id: 'minimax-m2.1',
  name: 'MiniMax M2.1',
  async call(messages, opts) {
    const resp = await createChatCompletion({
      apiKey: process.env.MINIMAX_API_KEY,
      model: 'abab5-chat',
      messages,
      temperature: opts.temperature ?? 0.7,
    });
    return resp.choices[0].message;
  },
};

You do not have to write this file — v0.32.1 already ships it. Just enable it in the gateway config:


# ~/.config/openclaw/gateway.yaml
llms:
  default: minimax-m2.1
  minimax-m2.1:
    provider: minimax
    model: abab5-chat
    maxTokens: 8192  # OpenClaw enforces half of full context by default
    pricing:
      prompt: 0.0005
      completion: 0.0007

Hot-reload the gateway:


$ clawctl reload gateway

From here every tool that calls agent.chat() will route to MiniMax, unless you override per-request.

Per-tool override example

For code generation I still prefer GPT-4. The DSL supports it:


agent.chat({
  model: 'gpt-4o-mini',
  messages: [
    {role: 'system', content: 'You are a senior Go engineer...'},
    {role: 'user', content: question},
  ],
});

Benchmark: real-world OpenClaw tasks

Benchmarks were run on a ClawCloud small instance (2 vCPU, 4 GB RAM, Oregon) and repeated at least 10× each. Latency numbers are 50th percentile:

1. Daily stand-up summary (3 Slack channels, 150 messages)

Input tokens: 11 420
Output tokens: 1 094
Cost: $0.0074
Latency: 24.1 s first token / 29.8 s final token

Quality note: M2.1 captured blockers correctly 8/10 times. GPT-4 hit 10/10, but cost was $0.12.

2. Browser scrape → summarise TechCrunch article

Tokens: 4 870 in, 512 out
Cost: $0.0039
Latency: 11.2 s / 14.3 s

Artifact hallucination rate: none observed. GPT-3.5 tended to invent quotes.

3. Shell tool: run `du -sh *` & explain top disk hogs

Shell bytes: 6 kB → ~900 tokens
Answer tokens: 204
Cost: $0.0008
Latency: 4.5 s / 5.6 s

Same task on Claude Instant mis-parsed the du table twice.

Observed quirks and failure modes

Long lists: anything over 30 bullets comes back truncated ~15% of the time. Setting max_tokens isn’t honored consistently.
Code blocks: The model wraps JavaScript in Markdown but forgets language annotations. A post-processor can fix it.
Non-English: German legalese summaries were solid, but Japanese emails hallucinated polite honorifics.
Token mis-count: MiniMax counts UTF-8 bytes ÷ 3, not true tokens. OpenClaw fixed the math in v0.32.2-next, until then 16-20% over-run is possible.

When to stick with GPT-4 or Claude instead

I kept two tasks on premium models:

Production code review — M2.1 missed subtle concurrency bugs in Go that GPT-4 found.
Legal contract redlines — hallucination cost too high; Claude 3 Sonnet still king.

Everything else (calendar triage, marketing copy drafts, notification digests) now runs on MiniMax without complaints from the team.

Next steps: scheduled tasks and cost guards

Cheap models encourage scope creep. I added two guardrails:


# ~/.config/openclaw/daemon.yaml
scheduler:
  maxDailySpendUSD: 0.25
  alertSlackChannel: C04N3C9QG

The daemon kills jobs if spend exceeds the cap and pings me on Slack. So far I average $0.08/day.

If MiniMax rolls out the rumored 128K context, I’ll update the gateway file and rerun the benchmarks. For now, M2.1 is the sweet spot for an always-on OpenClaw agent that doesn’t torch your AWS credits.

Kick the tires: swap the default LLM, set a spend cap, and post your findings in #community-benchmarks. The more data points we get, the better our defaults become.