You can spin up an OpenClaw agent on ClawCloud in under a minute, but leaving it unattended for weeks is a different game. The minute you flip the toggle from “hobby” to “always-on”, every mis-scoped permission, forgotten webhook, or runaway tool integration becomes potential pager noise—or worse, an incident. The following checklist is what I actually walk through before telling Slack that “the bot is live”. Steal it, fork it, PR it—just don’t skip it.
1. Access & permissions audit
OpenClaw inherits permissions from two places: the runtime user on the box (or container) and the auth tokens you wire into each skill. Both can drift quietly until you get burned. I start here.
1.1 List current UNIX capabilities
If you run the daemon under systemd, verify it’s sandboxed. Minimal happy path:
[Service]
User=openclaw
Group=openclaw
CapabilityBoundingSet=
NoNewPrivileges=yes
PrivateDevices=yes
ProtectHome=yes
ProtectSystem=strict
ReadWritePaths=/var/lib/openclaw
1.2 Check OpenClaw’s own permission store
OpenClaw >=0.32.0 ships perms sub-command:
# lists every skill + requested scope
npx openclaw perms list --format table
# diff against a gold file in git
npx openclaw perms list --format json > perms.json
jq -S . perms.json | sha256sum
If anything new shows up, I force a manual review. Community tip: store the expected hash in CI and fail the deploy if it changes.
1.3 Rotate tokens older than 90 days
I keep a one-liner in cron that dumps token ages:
jq -r '.[] | select(.created < (now-7776000)) | .name' ~/.openclaw/tokens.json
Anything returned goes on the rotation queue. GitHub users reported stale Notion keys causing silent 403s—avoid that noise.
2. Skill review and scope minimization
It’s tempting to grant calendar.read+write just so the agent can suggest meeting times. Don’t. OpenClaw follows the “first skill wins” rule: a skill higher in the manifest can satisfy a call even if another lower-privileged skill could have done it. So trim aggressively.
- Remove stub skills you used during prototyping (e.g.,
openai-gpt-35-debug). - Prefer read-only variants (
github.read) unless write is truly required. - Double-check any shell skill—community issues #2312 and #2450 show how easy it is to forget
cwdconfinement.
A quick grep catches accidental wildcards:
grep -R "scope: .*\*" openclaw.yml
3. Enable approval workflows and rate limits
Since 0.29.0, OpenClaw ships built-in approvals for destructive calls. I consider them mandatory in prod.
3.1 Approval config snippet
approvals:
# block until a human OKs file deletions
- match: "fs.rm"
require: human
# auto-approve low-risk fetches but rate-limit
- match: "http.get"
maxPerMinute: 60
Store the approvers list in LDAP or GitHub teams. Nothing kills trust like asking the intern at 3 a.m. to approve a DROP DATABASE.
3.2 Dry-run approvals during staging
Flip mode: audit to simulate approvals without blocking. I usually run staging for 24 h and inspect openclaw-approvals.log for surprises.
4. Monitoring, alerting, and log retention
If you can’t see it, you can’t fix it. I pipe everything to Loki + Grafana, but any stack works as long as you have the basics:
- Daemon health probe (HTTP 9090 /healthz)
- Skill latency histogram
- External API error rate
- Queue depth for scheduled tasks
- Host CPU/GPU and memory
4.1 Scrape config for Prometheus
- job_name: 'openclaw'
static_configs:
- targets: ['10.0.3.42:9090']
4.2 Useful alert rules
ALERT OpenClawAPIErrorsHigh
IF rate(openclaw_skill_errors_total[5m]) > 5
FOR 10m
LABELS { severity = "page" }
ANNOTATIONS {
summary = "OpenClaw skill errors >5/s for 10m",
}
Set log retention to at least 14 days; you’ll want that history when debugging a transient Slack ban.
5. Automated backups and rollback drills
OpenClaw persists memory in ~/.openclaw/memory.sqlite (as of 0.31.2). If you lose it, the agent becomes amnesic and might loop. I back it up hourly.
5.1 Simple systemd timer
[Unit]
Description=Backup OpenClaw memory
[Service]
Type=oneshot
ExecStart=/usr/bin/rsync -az --delete ~/.openclaw/ s3://ops-backups/openclaw/
[Install]
WantedBy=timers.target
Don’t stop at data. Keep versioned container images. Roll back with:
docker run --rm -d --name claw \
-e NODE_ENV=production \
ghcr.io/openclaw/openclaw:0.30.5
I rehearse rollback once per sprint; muscle memory matters.
6. Network perimeter and firewall rules
By default, the gateway exposes port 3000 on all interfaces. That’s fine on localhost, deadly on a public node.
6.1 Lock down inbound traffic
# allow only internal LB
iptables -A INPUT -p tcp --dport 3000 ! -s 10.0.0.0/8 -j DROP
6.2 Egress filters for shell & browser skills
If your agent pinch-hits for customer support, it has no business ssh-ing into staging. Egress policies catch mis-bindings:
# deny ssh from the container
iptables -A OUTPUT -p tcp --dport 22 -j REJECT
ClawCloud users: the portal now includes a point-and-click ACL editor (rolled out last week, see changelog 2024-05-27).
7. Sensitive data hygiene
The agent’s memory is a tempting place for secrets to leak. Rules I follow:
- Add secret scrubbing middleware (
openclaw-mw-redact@1.2.1). - Disable memory writes on public chat connectors:
memory.write=false. - Scope environment variables to the minimal project (
env -i NODE_ENV=prod npx openclaw). - Rotate OpenAI keys separately from OpenClaw tokens; different blast radius.
Run git secrets pre-commit hook; two users this month pasted AWS creds into a skill prompt and only caught it via CI.
8. The emergency kill switch
Something will go wrong—a mis-trained LLM jailbreak or a runaway loop. Have a one-step stop.
8.1 Local deployments
Systemd:
systemctl stop openclaw.service && systemctl disable openclaw.service
8.2 ClawCloud hosted
Every agent gets a UUID. Hitting
POST https://api.claw.cloud/v1/agents/{uuid}/shutdown
with a valid bearer token kills the container in <5 s. I keep a saved curl command in 1Password.
Bonus: wire a Slack slash command to that endpoint. Requires clawctl@0.6.0:
clawctl bind --team devops --command /panic --agent my-agent-prod
9. Final 60-second sanity reboot
- Node version:
node -vreturns 22.x (runtime bugs vanish). - openclaw --version matches the tagged git commit.
- All secrets live in the vault, not env files in the repo.
- Approvals show zero pending after dry-run.
- Grafana dashboard lights are green post-deploy.
If everything above checks out, flip the agent to always-on, close the tab, and go get coffee—without dreading the phone buzzing.
Next step: Add this checklist to your CI pipeline. A shell script that fails on any missing item is worth more than documentation nobody reads.