OpenClaw security checklist before going always-on production

You can spin up an OpenClaw agent on ClawCloud in under a minute, but leaving it unattended for weeks is a different game. The minute you flip the toggle from “hobby” to “always-on”, every mis-scoped permission, forgotten webhook, or runaway tool integration becomes potential pager noise—or worse, an incident. The following checklist is what I actually walk through before telling Slack that “the bot is live”. Steal it, fork it, PR it—just don’t skip it.

1. Access & permissions audit

OpenClaw inherits permissions from two places: the runtime user on the box (or container) and the auth tokens you wire into each skill. Both can drift quietly until you get burned. I start here.

1.1 List current UNIX capabilities

If you run the daemon under systemd, verify it’s sandboxed. Minimal happy path:

[Service]
User=openclaw
Group=openclaw
CapabilityBoundingSet=
NoNewPrivileges=yes
PrivateDevices=yes
ProtectHome=yes
ProtectSystem=strict
ReadWritePaths=/var/lib/openclaw

1.2 Check OpenClaw’s own permission store

OpenClaw >=0.32.0 ships perms sub-command:

# lists every skill + requested scope
npx openclaw perms list --format table

# diff against a gold file in git
npx openclaw perms list --format json > perms.json
jq -S . perms.json | sha256sum

If anything new shows up, I force a manual review. Community tip: store the expected hash in CI and fail the deploy if it changes.

1.3 Rotate tokens older than 90 days

I keep a one-liner in cron that dumps token ages:

jq -r '.[] | select(.created < (now-7776000)) | .name' ~/.openclaw/tokens.json

Anything returned goes on the rotation queue. GitHub users reported stale Notion keys causing silent 403s—avoid that noise.

2. Skill review and scope minimization

It’s tempting to grant calendar.read+write just so the agent can suggest meeting times. Don’t. OpenClaw follows the “first skill wins” rule: a skill higher in the manifest can satisfy a call even if another lower-privileged skill could have done it. So trim aggressively.

Remove stub skills you used during prototyping (e.g., openai-gpt-35-debug).
Prefer read-only variants (github.read) unless write is truly required.
Double-check any shell skill—community issues #2312 and #2450 show how easy it is to forget cwd confinement.

A quick grep catches accidental wildcards:

grep -R "scope: .*\*" openclaw.yml

3. Enable approval workflows and rate limits

Since 0.29.0, OpenClaw ships built-in approvals for destructive calls. I consider them mandatory in prod.

3.1 Approval config snippet

approvals:
  # block until a human OKs file deletions
  - match: "fs.rm"
    require: human
  # auto-approve low-risk fetches but rate-limit
  - match: "http.get"
    maxPerMinute: 60

Store the approvers list in LDAP or GitHub teams. Nothing kills trust like asking the intern at 3 a.m. to approve a DROP DATABASE.

3.2 Dry-run approvals during staging

Flip mode: audit to simulate approvals without blocking. I usually run staging for 24 h and inspect openclaw-approvals.log for surprises.

4. Monitoring, alerting, and log retention

If you can’t see it, you can’t fix it. I pipe everything to Loki + Grafana, but any stack works as long as you have the basics:

Daemon health probe (HTTP 9090 /healthz)
Skill latency histogram
External API error rate
Queue depth for scheduled tasks
Host CPU/GPU and memory

4.1 Scrape config for Prometheus

  - job_name: 'openclaw'
    static_configs:
      - targets: ['10.0.3.42:9090']

4.2 Useful alert rules

ALERT OpenClawAPIErrorsHigh
  IF rate(openclaw_skill_errors_total[5m]) > 5
  FOR 10m
  LABELS { severity = "page" }
  ANNOTATIONS {
    summary = "OpenClaw skill errors >5/s for 10m",
  }

Set log retention to at least 14 days; you’ll want that history when debugging a transient Slack ban.

5. Automated backups and rollback drills

OpenClaw persists memory in ~/.openclaw/memory.sqlite (as of 0.31.2). If you lose it, the agent becomes amnesic and might loop. I back it up hourly.

5.1 Simple systemd timer

[Unit]
Description=Backup OpenClaw memory

[Service]
Type=oneshot
ExecStart=/usr/bin/rsync -az --delete ~/.openclaw/ s3://ops-backups/openclaw/

[Install]
WantedBy=timers.target

Don’t stop at data. Keep versioned container images. Roll back with:

docker run --rm -d --name claw \
  -e NODE_ENV=production \
  ghcr.io/openclaw/openclaw:0.30.5

I rehearse rollback once per sprint; muscle memory matters.

6. Network perimeter and firewall rules

By default, the gateway exposes port 3000 on all interfaces. That’s fine on localhost, deadly on a public node.

6.1 Lock down inbound traffic

# allow only internal LB
iptables -A INPUT -p tcp --dport 3000 ! -s 10.0.0.0/8 -j DROP

6.2 Egress filters for shell & browser skills

If your agent pinch-hits for customer support, it has no business ssh-ing into staging. Egress policies catch mis-bindings:

# deny ssh from the container
iptables -A OUTPUT -p tcp --dport 22 -j REJECT

ClawCloud users: the portal now includes a point-and-click ACL editor (rolled out last week, see changelog 2024-05-27).

7. Sensitive data hygiene

The agent’s memory is a tempting place for secrets to leak. Rules I follow:

Add secret scrubbing middleware (openclaw-mw-redact@1.2.1).
Disable memory writes on public chat connectors: memory.write=false.
Scope environment variables to the minimal project (env -i NODE_ENV=prod npx openclaw).
Rotate OpenAI keys separately from OpenClaw tokens; different blast radius.

Run git secrets pre-commit hook; two users this month pasted AWS creds into a skill prompt and only caught it via CI.

8. The emergency kill switch

Something will go wrong—a mis-trained LLM jailbreak or a runaway loop. Have a one-step stop.

8.1 Local deployments

Systemd:

systemctl stop openclaw.service && systemctl disable openclaw.service

8.2 ClawCloud hosted

Every agent gets a UUID. Hitting

POST https://api.claw.cloud/v1/agents/{uuid}/shutdown

with a valid bearer token kills the container in <5 s. I keep a saved curl command in 1Password.

Bonus: wire a Slack slash command to that endpoint. Requires clawctl@0.6.0:

clawctl bind --team devops --command /panic --agent my-agent-prod

9. Final 60-second sanity reboot

Node version: node -v returns 22.x (runtime bugs vanish).
openclaw --version matches the tagged git commit.
All secrets live in the vault, not env files in the repo.
Approvals show zero pending after dry-run.
Grafana dashboard lights are green post-deploy.

If everything above checks out, flip the agent to always-on, close the tab, and go get coffee—without dreading the phone buzzing.

Next step: Add this checklist to your CI pipeline. A shell script that fails on any missing item is worth more than documentation nobody reads.