How to Use OpenClaw with DeepSeek for Cheap Local AI Workloads

You can run an OpenClaw agent on a Raspberry Pi and still have it draft emails, write code, and talk to Slack—all day—without melting your credit card. The trick is to swap the default OpenAI backend for DeepSeek, the Chinese-built model family that’s aggressively priced and surprisingly capable. This post shows the exact wiring I use in production, the numbers I measured against GPT-4 / Claude 3, and the rough edges you should expect.

Why DeepSeek Is the Go-To for Cost-Sensitive Deployments

OpenClaw ships with an OpenAI preset because that’s what most of the Western community tested first. In China, though, DeepSeek is the default. The main reasons:

Price: DeepSeek-Chat 67B is $0.45 / million input tokens and $0.90 / million output. That’s 15× cheaper than GPT-4o (June 24 pricing).
Latency: Regional endpoints in Beijing and Hong Kong give 400–700 ms first-token latency from Shanghai—good enough for chat.
Regulatory access: Users behind the Great Firewall don’t need VPN gymnastics.
Reasonable model quality: Coding tasks land between GPT-3.5-Turbo and Claude 3 Haiku, good enough for most agent automations.

For personal hacking and small-team internal tools, those trade-offs are often worth it. Below I’ll show numbers from my own workload: an OpenClaw bot triaging GitHub issues and answering Telegram DMs in Chinese and English.

Prerequisites, Versions, and One-Minute Checklist

Node 22.3.1 or later (node -v should print v22.x).
OpenClaw 0.46.1 (latest at the time of writing). Install with npm i -g openclaw@latest.
A DeepSeek account with billing enabled. You’ll need at least ¥10 on the meter.
Optional but handy: jq for parsing JSON when debugging.
Hardware: anything that can keep Node alive. My personal setup is an 8 GB Rock 5B running Ubuntu 22.04.

All commands below were executed on that board; substitute paths as needed.

Creating and Testing a DeepSeek API Key

You generate keys in the DeepSeek console under 个人中心 → API 密钥. Hit “Create”, copy the token, and store it somewhere not-GitHub.

Export the key:

export DEEPSEEK_API_KEY="sk-live-yourrealkeyhere"

Sanity-check with curl to catch network/firewall issues before blaming OpenClaw:

curl https://api.deepseek.com/v1/chat/completions \
  -H "Authorization: Bearer $DEEPSEEK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
        "model": "deepseek-chat", 
        "messages": [{"role":"user","content":"ping"}],
        "temperature":0
      }' | jq '.choices[0].message.content'

If you see "pong" (or similar), the backend is healthy.

Wiring DeepSeek into the OpenClaw Gateway

The gateway is where you configure models, tools, memory stores, and UI options. Starting 0.45.x we got first-class provider blocks—no more monkey-patching Node env vars.

Step 1. Create a provider YAML

# ~/.openclaw/providers/deepseek.yaml
id: deepseek
kind: openai-compatible
baseUrl: "https://api.deepseek.com/v1"
auth: 
  type: bearer
  token: ${env.DEEPSEEK_API_KEY}
models:
  chat:
    default: deepseek-chat
    options:
      deepseek-chat: {maxTokens: 16384, costInput: 0.00045, costOutput: 0.0009}
      deepseek-coder: {maxTokens: 32768, costInput: 0.0008, costOutput: 0.0016}

Note the kind: openai-compatible; DeepSeek mirrors the OpenAI route layout, so the existing OpenClaw client just works.

Step 2. Point your agent at the provider

# ~/.openclaw/agents/github-triage.yaml
name: core-bot
provider: deepseek
persona: |
  You are a helpful project assistant…
tools:
  - browser
  - github
memory:
  type: sqlite

That’s it. Fire up the gateway:

openclaw gateway start

Visit http://localhost:4040; you should see “DeepSeek” in the model dropdown.

Choosing Between DeepSeek-Chat, DeepSeek-Coder, and Local Quantizations

DeepSeek ships multiple checkpoints. The two most relevant for agents:

deepseek-chat-67B – general multilingual chat. 16,384 tokens. Roughly GPT-3.5 quality on reasoning, slightly better on Mandarin nuance.
deepseek-coder-33B/6.7B – trained on code. Performs like Claude 3 Haiku on LeetCode, cheaper than GPT-3.5 Turbo.

If you only handle natural-language tasks (email drafting, meeting summaries), use deepseek-chat. For code manipulation—merging PRs, generating unit tests—go coder.

Running locally with vLLM

You can also download the open-weights version from Hugging Face (deepseek-ai/deepseek-llm-67b-instruct) and host via vllm:

pip install "vllm[triton]" torch --extra-index-url https://download.pytorch.org/whl/cu121
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
  vllm.server --model deepseek-ai/deepseek-llm-67b-instruct \
              --tensor-parallel-size 8 \
              --port 8001

Then swap baseUrl in the provider YAML to http://localhost:8001/v1. If you have consumer GPUs, 6.7B Q4_K_M quant in llama.cpp will run in 16 GB of RAM; expect 25 tokens/s.

Performance Benchmarks vs GPT and Claude

Here’s what I measured on 10 GitHub issue triage prompts (single-line question + repo README) using the remote APIs:

Model	First Token (ms)	Tokens/s	Cost / 1k tokens (in/out)	Accuracy*
GPT-4o	1530	28	$5 / $15	9/10
Claude 3 Sonnet	900	34	$3 / $15	8/10
DeepSeek-Chat 67B	760	31	$0.45 / $0.90	7/10
DeepSeek-Coder 33B	820	29	$0.80 / $1.60	8/10 (code)

*Accuracy is rough human-scored correctness.

In practice, the 2 incorrect answers from DeepSeek were edge cases about Git submodules; the agent retried with a tool call anyway, so user experience held up.

Tool-Call Limitations and Workarounds

OpenClaw relies on function-calling (the JSON schema system first popularized by OpenAI) to decide when to run browser.search() or github.getIssue(). DeepSeek implemented a subset of that protocol.

No automatic JSON mode flag. You must stay within 8,000 characters of schema or the model hallucinates keys. I embed the schema in every system prompt chunk.
Lower reliability (≈70%) on emitting "name"+"arguments" vs GPT-4 (≈95%).
Lack of parallel function calls. You must ask sequentially.

My best workaround is aggressive retryWithSchema() middleware:

// ~/.openclaw/plugins/retryWithSchema.mjs
export async function onChatCompletion({call, schema}) {
  const res = await call();
  if (!isJson(res)) {
    return await call({forcedSchema: schema, temperature:0});
  }
  return res;
}

Hook this in ~/.openclaw/config.yaml:

plugins:
  - ./plugins/retryWithSchema.mjs

Until DeepSeek exposes a response_format param like OpenAI, this is the safest path.

Cost Breakdown and Real-World Savings

My weekly bot stats (exported via ClawCloud’s usage CSV):

222 conversations, 42,310 input tokens, 39,877 output.
With GPT-3.5-Turbo (Jan 24 model) that’s 82.1K * $0.0005 = $41.05.
With DeepSeek-Chat: (42.31K * 0.00045) + (39.88K * 0.0009) = $35.96 → 12% saved.
But the real win is burst loads. I ran a one-off 1M-token code-refactor job that would be $20 on GPT-3.5; DeepSeek cost $1.35.

If you’re moving lots of text or doing background summarization, DeepSeek’s unit economics matter.

Putting It All Together: Spin Up an Agent in 90 Seconds

Install OpenClaw and export keys:

npm i -g openclaw@latest
export DEEPSEEK_API_KEY="sk-live-..."
openclaw login # optional if you use ClawCloud

Drop the provider file from above.
Create simple.yaml for a Telegram bot:

name: tlg-helper
provider: deepseek
persona: |
  你是一个中文和英文都很流利的助手。
connectors:
  telegram:
    botToken: ${env.TELEGRAM_BOT_TOKEN}
tools:
  - shell
  - browser
  - translator

Launch:

openclaw daemon start --agent simple.yaml

Ping the bot in Telegram. Watch the logs. Average answer cost is ￥0.0003.

Where to Go Next

DeepSeek pushes updates weekly. Keep an eye on the MoE repo; the mixture-of-experts variant promises Claude-2-level reasoning at GPT-3.5 pricing once they open the API. Meanwhile, perf-test the 6.7B quant if you’re GPU-poor, and share failures in #models-deepseek on the OpenClaw Discord—the community is building better tool-call prompt templates every day.

The bottom line: OpenClaw + DeepSeek is not magic, but it’s cheap, surprisingly solid, and 100% doable on hardware you already own. Fire it up, measure, and keep patches flowing.