You can spin up an OpenClaw agent on ClawCloud in under a minute, but leaving it unattended for weeks is a different game. The minute you flip the toggle from “hobby” to “always-on”, every mis-scoped permission, forgotten webhook, or runaway tool integration becomes potential pager noise—or worse, an incident. The following checklist is what I actually walk through before telling Slack that “the bot is live”. Steal it, fork it, PR it—just don’t skip it.

1. Access & permissions audit

OpenClaw inherits permissions from two places: the runtime user on the box (or container) and the auth tokens you wire into each skill. Both can drift quietly until you get burned. I start here.

1.1 List current UNIX capabilities

If you run the daemon under systemd, verify it’s sandboxed. Minimal happy path:

[Service] User=openclaw Group=openclaw CapabilityBoundingSet= NoNewPrivileges=yes PrivateDevices=yes ProtectHome=yes ProtectSystem=strict ReadWritePaths=/var/lib/openclaw

1.2 Check OpenClaw’s own permission store

OpenClaw >=0.32.0 ships perms sub-command:

# lists every skill + requested scope npx openclaw perms list --format table # diff against a gold file in git npx openclaw perms list --format json > perms.json jq -S . perms.json | sha256sum

If anything new shows up, I force a manual review. Community tip: store the expected hash in CI and fail the deploy if it changes.

1.3 Rotate tokens older than 90 days

I keep a one-liner in cron that dumps token ages:

jq -r '.[] | select(.created < (now-7776000)) | .name' ~/.openclaw/tokens.json

Anything returned goes on the rotation queue. GitHub users reported stale Notion keys causing silent 403s—avoid that noise.

2. Skill review and scope minimization

It’s tempting to grant calendar.read+write just so the agent can suggest meeting times. Don’t. OpenClaw follows the “first skill wins” rule: a skill higher in the manifest can satisfy a call even if another lower-privileged skill could have done it. So trim aggressively.

  • Remove stub skills you used during prototyping (e.g., openai-gpt-35-debug).
  • Prefer read-only variants (github.read) unless write is truly required.
  • Double-check any shell skill—community issues #2312 and #2450 show how easy it is to forget cwd confinement.

A quick grep catches accidental wildcards:

grep -R "scope: .*\*" openclaw.yml

3. Enable approval workflows and rate limits

Since 0.29.0, OpenClaw ships built-in approvals for destructive calls. I consider them mandatory in prod.

3.1 Approval config snippet

approvals: # block until a human OKs file deletions - match: "fs.rm" require: human # auto-approve low-risk fetches but rate-limit - match: "http.get" maxPerMinute: 60

Store the approvers list in LDAP or GitHub teams. Nothing kills trust like asking the intern at 3 a.m. to approve a DROP DATABASE.

3.2 Dry-run approvals during staging

Flip mode: audit to simulate approvals without blocking. I usually run staging for 24 h and inspect openclaw-approvals.log for surprises.

4. Monitoring, alerting, and log retention

If you can’t see it, you can’t fix it. I pipe everything to Loki + Grafana, but any stack works as long as you have the basics:

  1. Daemon health probe (HTTP 9090 /healthz)
  2. Skill latency histogram
  3. External API error rate
  4. Queue depth for scheduled tasks
  5. Host CPU/GPU and memory

4.1 Scrape config for Prometheus

- job_name: 'openclaw' static_configs: - targets: ['10.0.3.42:9090']

4.2 Useful alert rules

ALERT OpenClawAPIErrorsHigh IF rate(openclaw_skill_errors_total[5m]) > 5 FOR 10m LABELS { severity = "page" } ANNOTATIONS { summary = "OpenClaw skill errors >5/s for 10m", }

Set log retention to at least 14 days; you’ll want that history when debugging a transient Slack ban.

5. Automated backups and rollback drills

OpenClaw persists memory in ~/.openclaw/memory.sqlite (as of 0.31.2). If you lose it, the agent becomes amnesic and might loop. I back it up hourly.

5.1 Simple systemd timer

[Unit] Description=Backup OpenClaw memory [Service] Type=oneshot ExecStart=/usr/bin/rsync -az --delete ~/.openclaw/ s3://ops-backups/openclaw/ [Install] WantedBy=timers.target

Don’t stop at data. Keep versioned container images. Roll back with:

docker run --rm -d --name claw \ -e NODE_ENV=production \ ghcr.io/openclaw/openclaw:0.30.5

I rehearse rollback once per sprint; muscle memory matters.

6. Network perimeter and firewall rules

By default, the gateway exposes port 3000 on all interfaces. That’s fine on localhost, deadly on a public node.

6.1 Lock down inbound traffic

# allow only internal LB iptables -A INPUT -p tcp --dport 3000 ! -s 10.0.0.0/8 -j DROP

6.2 Egress filters for shell & browser skills

If your agent pinch-hits for customer support, it has no business ssh-ing into staging. Egress policies catch mis-bindings:

# deny ssh from the container iptables -A OUTPUT -p tcp --dport 22 -j REJECT

ClawCloud users: the portal now includes a point-and-click ACL editor (rolled out last week, see changelog 2024-05-27).

7. Sensitive data hygiene

The agent’s memory is a tempting place for secrets to leak. Rules I follow:

  • Add secret scrubbing middleware (openclaw-mw-redact@1.2.1).
  • Disable memory writes on public chat connectors: memory.write=false.
  • Scope environment variables to the minimal project (env -i NODE_ENV=prod npx openclaw).
  • Rotate OpenAI keys separately from OpenClaw tokens; different blast radius.

Run git secrets pre-commit hook; two users this month pasted AWS creds into a skill prompt and only caught it via CI.

8. The emergency kill switch

Something will go wrong—a mis-trained LLM jailbreak or a runaway loop. Have a one-step stop.

8.1 Local deployments

Systemd:

systemctl stop openclaw.service && systemctl disable openclaw.service

8.2 ClawCloud hosted

Every agent gets a UUID. Hitting

POST https://api.claw.cloud/v1/agents/{uuid}/shutdown

with a valid bearer token kills the container in <5 s. I keep a saved curl command in 1Password.

Bonus: wire a Slack slash command to that endpoint. Requires clawctl@0.6.0:

clawctl bind --team devops --command /panic --agent my-agent-prod

9. Final 60-second sanity reboot

  • Node version: node -v returns 22.x (runtime bugs vanish).
  • openclaw --version matches the tagged git commit.
  • All secrets live in the vault, not env files in the repo.
  • Approvals show zero pending after dry-run.
  • Grafana dashboard lights are green post-deploy.

If everything above checks out, flip the agent to always-on, close the tab, and go get coffee—without dreading the phone buzzing.

Next step: Add this checklist to your CI pipeline. A shell script that fails on any missing item is worth more than documentation nobody reads.