Customer email is still the most abused notification channel in 2024. If you are running a five-person SaaS, every "How do I reset my password?" steals flow from shipping the next feature. This guide shows exactly how I wired OpenClaw 3.2.1 into our shared inbox and Slack to handle the repetitive 70 % of tickets, surface the 30 % that need a human, and keep a paper trail good enough for auditors.
Why automate first-line support with OpenClaw?
OpenClaw is a Node.js agent framework that speaks email, Slack, Discord, SMS, and 800+ other APIs via Composio. Because the same agent can run local or on ClawCloud, you can prototype on your laptop and migrate to prod later. What finally pushed me over the edge:
- We had 300 tickets/month, but 210 were account or billing FAQs. Easy pattern matching.
- Our SLA is four hours. Nights and weekends broke that every week.
- Budget for a full-time support rep didn’t exist yet.
The numbers convinced me to give the robot the first shot at replying. The trick is building guardrails so it never tells a customer “Try turning it off and on” when their card is being double-charged. The rest of this post covers the full workflow, including how we caught bad answers before they shipped.
Prerequisites and architecture overview
Minimal stack:
- Node 22.2+ (the current LTS; OpenClaw dropped 16/18 support last month)
- OpenClaw 3.2.1
- PostgreSQL (for persistent memory & ticket store). SQLite works, but you’ll outgrow it quickly.
- IMAP/SMTP access to your support@ inbox or Google Workspace API creds.
- A Slack app with
channels:history&chat:writescopes. - Optional: ClawCloud account if you prefer not to host.
High-level flow:
- Daemon polls email/Slack every 30 s.
- New message triggers an "ingest" tool that stores the raw payload, sender, thread id.
- Classification skill tags ticket:
[faq],[billing],[bug],[unknown]. - If confidence > 0.8 and category in the "safe" list, drafting skill writes a response using our template library + retrieval from docs.
- Draft is pushed to Slack #support-triage for human approval. Two button reactions: ✅ to send, 🛑 to cancel.
- If nobody reacts in 15 min, agent self-sends (we tuned this later).
- Anything classified
[unknown]or confidence < 0.8 creates an escalation task in Linear and notifies on-call.
Connecting OpenClaw to your support inbox and Slack
Install and scaffold
For local dev I use PNPM because of the node-modules footprint, but npm works fine.
$ pnpm add -g openclaw@3.2.1
$ claw init support-bot && cd support-bot
The wizard asks for runtime (select "Email/Slack template"), default memory backend, and whether you want example skills. Say yes—it generates classify.ts, draft.ts, and metrics.ts.
Email adapter config
# config/adapters/email.yaml
adapter: imap
imap:
host: imap.gmail.com
port: 993
user: support@acme.io
passEnv: SUPPORT_EMAIL_PASS
smtp:
host: smtp.gmail.com
port: 587
user: support@acme.io
passEnv: SUPPORT_SMTP_PASS
pollInterval: 30s
I store creds in 1Password → dotenv. The agent substitutes SUPPORT_EMAIL_PASS at runtime.
Slack adapter config
# config/adapters/slack.yaml
signingSecretEnv: SLACK_SIGNING_SECRET
botTokenEnv: SLACK_BOT_TOKEN
channels:
- C01TRIAGE # #support-triage
- C01INBOX # #support-inbox (read-only mirror)
pollInterval: 10s
Give the bot chat:write, channels:history, and reactions:write. In testing, missing reactions:write silently broke approval flow, so double-check.
Routing incoming messages to an "AI first responder" skill
OpenClaw calls discrete pieces of work "skills". The default pipeline is:
[adapter] → ingest → classify → (draft | escalate) → deliver
Everything is just JS/TS, so routing is a switch statement in src/pipeline.ts:
export async function pipeline(ctx: TicketCtx) {
const { classification, confidence } = await classify(ctx);
if (confidence > 0.8 && ["faq", "billing"].includes(classification)) {
return draft(ctx, classification);
}
return escalate(ctx, classification, confidence);
}
Classification uses OpenAI gpt-4o-mini-32k by default. We tried Anthropic Sonnet and Mistral Medium; accuracy was similar but Anthropic lagged ~1.2 s and cost 20 % more. Set the provider with:
export const llm = new OpenClaw.LLM({
provider: 'openai',
model: 'gpt-4o-mini-32k',
temp: 0.1
});
Drafting safe responses with retrieval and templates
My first naive prompt looked like this:
"Draft a polite reply to the message using the docs below: {{docs}}"
It hallucinated a pricing tier we killed in 2022. Lesson: retrieval without template discipline is risky.
Template library
I switched to explicit templates:
# templates/password-reset.md
Hi {{name}},
You can reset your password at {{reset_url}}. If you no longer have access to the email on file, reply here and we’ll help manually.
— Acme Support
Then the drafting skill picks a template based on the classification tag:
const tpl = templates[ctx.classification];
return tpl.render({
name: ctx.senderName,
reset_url: 'https://acme.io/reset'
});
If no template matches, we drop to human escalation.
Hybrid LLM + template approach
Edge cases like “I reset my password but never got the email” need dynamic text. The pattern that worked:
- Use template boilerplate for greeting, sign-off, legal watermarks.
- Let the LLM fill the problem-specific paragraph only.
- Wrap the LLM output in
<answer>...</answer>tags and validate length < 700 chars to stay concise.
Example draft prompt:
You are Acme Support. Answer between the <answer> tags only. Max 700 characters.
<context>
{{retrieved_docs}}
</context>
<message>
{{customer_message}}
</message>
<answer>
This removed 90 % hallucinations in our sample of 100 tickets.
Ticket classification and escalation rules
Configuring categories
# config/classifier.yaml
categories:
faq:
keywords: [reset, password, login, pricing, invoice]
billing:
keywords: [charge, refund, invoice, VAT]
bug:
keywords: [error, crash, 500, cannot]
unknown: {}
minimumConfidence: 0.8
The built-in classifier first checks keyword heuristics before hitting the LLM. That saved us ~40 % tokens.
Escalation targets
- bug → Linear project "SUP-BUG" with priority = P2.
- unknown or confidence < 0.8 → Slack #support-escalated.
- Anything with the phrase “legal” or “GDPR” gets force-escalated regardless of confidence.
Escalation payload:
await linearClient.createIssue({
teamId: 'SUP',
title: `[${classification}] ${ctx.subject}`,
description: ctx.fullBody,
labels: ['from-openclaw']
});
Don’t over-optimize the taxonomy early. We started with six categories, pruned to four after two weeks when "product-feedback" answers always needed a human anyway.
Logging, metrics, and feedback loops
Structured logs
OpenClaw emits pino logs. I pipe them into Loki via promtail:
$ NODE_ENV=prod LOG_FORMAT=loki claw daemon 2>&1 | promtail
Critical fields:
ticket_idclassificationconfidencedraft_time_msapproval: human|auto|rejected
Prometheus metrics
# HELP openclaw_tickets_total Total tickets handled
# TYPE openclaw_tickets_total counter
openclaw_tickets_total{classification="faq"} 97
I added a custom collector in metrics.ts:
Gauge({ name: 'openclaw_ai_savings_hours', help: 'Estimated hours saved' })
.set(totalAutoSends * 0.083); // ~5 min per FAQ ticket
Feedback loop
Every morning the support lead skims a linear dashboard:
- Auto-sent yesterday: 32
- Human approved: 11
- Rejected: 2 → adds "bad-response" label → feeds fine-tuning job weekly
We store rejected answer + actual human reply back into Postgres. A cron skill trains a small 40-epoch LoRA over the weekend. FWIW, fine-tuning improved phrasing but not classification; we left classification on vanilla GPT.
Guardrails: stopping embarrassing hallucinations before they ship
This section took the most iteration. Things that worked:
Moderation endpoint
OpenAI’s free moderation endpoint runs before every send. We block if the categories include harassment, self-harm, or sexual. Hits are rare (0.1 %) but worth the call.
Regex sanity checks
We run a simple regex gate:
const blacklist = [/\$\d{1,4}/, /discount/i, /lawyer/i];
if (blacklist.some(rx => rx.test(draft))) {
return escalate(ctx, 'unsafe-content');
}
Stops the agent from randomly promising refunds.
Human in the loop window
I set approvalWindow = 900s. During weekdays somebody always clicks ✅ within five minutes. Nights are auto-send. We track misfires; two bad night sends in the first month, both minor wording issues.
Self-critique chain
Inspired by a GitHub issue (#2849): run a second, cheaper model (GPT-3.5-turbo-0125) to grade the first draft:
"You are a QA agent. Score the answer 1-5 on correctness. If <4, suggest a fix <200 chars."
If score <4, we attach suggested fix into the Slack message so the human can one-click apply. Average extra latency: 1.1 s.
Deploying to ClawCloud vs self-hosting
I ran on a $6 Hetzner VPS for a week, then moved to ClawCloud. Reasons:
- Zero SSL/IMAP headaches. Just link Google Workspace and Slack in the UI.
- Automatic scaling: weekdays we need four concurrent skills; weekends it idles at one.
- Built-in metrics dashboard identical to my Grafana setup.
- Cost about the same once you price in my time.
Migration is literally:
$ claw push --project support-bot --env prod
Env vars copy over; ClawCloud provisions a Postgres 15 instance and object storage for logs. Cold start P95 was 1.6 s on the free tier.
Next steps: iterate with real customer feedback
Automation gets you breathing room, not perfection. After four weeks:
- Overall first-reply time dropped from 3 h 17 m → 11 m.
- 75 % of tickets auto-sent or human-approved template. The team now answers the tricky 25 % faster.
- CSAT stayed flat at 4.7/5 (customers didn’t notice or care that a bot wrote half the replies).
- We’re adding localization next—OpenClaw’s locale routing looks straightforward.
If you try this setup, start tiny: one or two FAQ templates, tight confidence threshold, and a scary blacklist. Ship, measure, and expand. Feel free to ping me on the OpenClaw Discord (#customer-support-automation) if you hit snags.