OpenClaw + ElevenLabs Voice Integration for Phone Calls: Full Setup

If you landed here you probably typed something like “OpenClaw ElevenLabs voice integration setup for phone calls” into your search bar. Same. I wanted my agents to literally ring me up and talk. The docs were scattered, community threads half-finished, so I sat down and wired the whole thing end-to-end. This guide is every step that finally worked for me—no marketing fluff.

Why Let OpenClaw Call You At All?

Chat-style notifications are fine until you’re biking home with no signal, or your Terraform deploy hung and the pager pings at 03:00. A voice call breaks through Do Not Disturb, reads context faster than thumb-scrolling, and you can respond hands-free. The combo of OpenClaw’s multi-tool agents and ElevenLabs’ scary-good text-to-speech (TTS) means: your code can literally talk to you like a human.

Incident response: hear the stack trace while you open the laptop.
Logistics: warehouse status phoned in to ops leads.
Custom meditations: my actual use case—more later.

Requirements & Versions That Actually Matter

Skip these and you’ll lose an evening on weird TLS errors.

OpenClaw v0.28.3+ (npm install -g openclaw@latest). Earlier builds hard-code a deprecated Twilio param.
Node 22+ (check with node -v). Apple silicon ships 20.x via Homebrew; upgrade or the ElevenLabs client crashes on fetch API polyfills.
ElevenLabs account (free tier works up to 10 kB/day).
Phone gateway: Twilio, Vonage, or Signalwire. I’m using Twilio because it’s what the upstream examples bake in. Replace the webhooks if you’re on something else.
macOS 14 or iOS 17 if you want the optional voice wake so the handset stops ringing once you say the hotword.

Step 1 – Generate an ElevenLabs API Key

Log in to elevenlabs.io.
Navigate to Account → API Keys.
Click Generate new key. Name it openclaw-prod so you remember.
Copy the key; they never show it again.

Free tier gives 0.5 M characters / month. Fast prototyping lives comfortably under that.

Step 2 – Wire ElevenLabs Into OpenClaw

OpenClaw reads a .env by default. Put the secrets there:

OPENCLAW_PROVIDER=elevenlabs
ELEVENLABS_API_KEY=<the-key-you-copied>
TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TWILIO_AUTH_TOKEN=your_twilio_token
TWILIO_FROM_NUMBER=+15551234567  # your Twilio number
MY_PERSONAL_NUMBER=+15559876543  # the phone the bot should call

Restart the daemon so it re-hydrates the environment:

openclaw daemon restart

Confirm the plugin loaded:

openclaw ctl plugins | grep eleven

You should see:

✓ elevenlabs-tts v0.6.1 [loaded]

Step 3 – Pick and Test a Voice

ElevenLabs ships 70+ premade voices plus cloning. The voice UID—not the human-readable name—goes into OpenClaw.

Open the Voice Lab tab on ElevenLabs.
Click a voice, open DevTools, peek at the network call /voices/<id>. Copy that id.
Add to .env:

ELEVENLABS_VOICE_ID=aT5Sj3cffcRJD8eH5xCZ

Quick sanity test without even involving Twilio:

npx elevenlabs-tts "Hello from OpenClaw on $(hostname)" \
  --voice $ELEVENLABS_VOICE_ID \
  --api-key $ELEVENLABS_API_KEY \
  --out hello.wav && afplay hello.wav

If your Mac speaks, the key + voice combo is valid.

Step 4 – Set Up the Phone Call Flow

4.1 Create a Twilio Function

OpenClaw hits a URL to start the call. The function’s entire job: TTS the text we pass and bridge the call.

// /functions/voice-proxy.js
exports.handler = function(context, event, callback) {
  const twiml = new Twilio.twiml.VoiceResponse();
  const say = twiml.say({voice: 'Polly.Joanna', language: 'en-US'});
  say.addText(event.s || 'No text provided');
  return callback(null, twiml);
};

You’ll overwrite Polly with ElevenLabs later; Twilio Studio just needs something to deploy.

4.2 Expose a Public Endpoint from OpenClaw

The ElevenLabs plugin registers /tts. We map a call trigger onto it:

# routes.yaml
- path: /call-me
  method: POST
  script: ./scripts/call-me.js

// scripts/call-me.js
export default async function (req, res, ctx) {
  const { text } = await req.json();
  const twilio = ctx.twilio();
  await twilio.calls.create({
    url: `${ctx.baseUrl}/tts?text=${encodeURIComponent(text)}`,
    to: process.env.MY_PERSONAL_NUMBER,
    from: process.env.TWILIO_FROM_NUMBER
  });
  res.end('calling');
}

Hot reload and curl it:

curl -XPOST https://agent.claw.cloud/call-me \
  -d '{"text":"It works, but you deserve real coffee."}' \
  -H 'Content-Type: application/json'

If your phone rings and the voice speaks, you’re 80% done.

Step 5 – Enable Voice Wake on macOS/iOS

Great, but ending the call hands-free matters. Apple ships a voice wake API starting macOS 14/iOS 17 that apps can subscribe to. The upstream OpenClaw desktop wrapper listens for it; make sure you’re on the right build.

Upgrade to macOS 14.3 (Settings → General → Software Update).
In Accessibility → Voice Control toggle Enable voice control.
Add a custom command: When I say “Claw stop”, perform action Press key → Esc. (Twilio hangs up on Esc in the wrapper.)
Sync iCloud so the command migrates to your iPhone.

Now when OpenClaw babbles for too long you mumble “Claw stop” and the call ends. Latency ~700 ms on my M2 Air.

Step 6 – Scheduling and Triggering Calls from Inside OpenClaw

Two patterns I use:

Cron-style in `gateway.yaml`

jobs:
  morning-brief:
    schedule: "0 7 * * *"   # every day at 07:00
    action: script:./scripts/call-me.js
    env:
      TEXT: "Good morning. GitHub issues: $(open issues). Calendar: $(next event)."

Event-driven via GitHub webhook

# .github/workflows/notify.yml
on:
  pull_request:
    types: [opened]
jobs:
  notify:
    runs-on: ubuntu-latest
    steps:
      - name: Call reviewer
        run: |
          curl -XPOST https://agent.claw.cloud/call-me \
            -H 'Content-Type: application/json' \
            -d '{"text":"PR#${{ github.event.pull_request.number }} is ready for review."}'

Step 7 – Custom-Generated Meditations with Ambient Audio

This is where ElevenLabs shines. Human-like voices + background layers = meditation on demand.

7.1 Find an ambient loop

I grabbed a 10-min rain track from freesound.org (CC-0). Save as rain.wav.

7.2 Concatenate TTS with background

// scripts/meditate.js
import { join } from 'node:path';
import ffmpeg from 'fluent-ffmpeg';
import { tts } from 'openclaw-elevenlabs';

export default async function(req, res, ctx) {
  const script = `Breathe in… Breathe out… Remember you’re still on call rotation.`;
  const ttsFile = '/tmp/voice.wav';
  await tts(script, ttsFile, { voice: process.env.ELEVENLABS_VOICE_ID });

  const out = '/tmp/meditation.wav';
  await new Promise((ok, err) => {
    ffmpeg()
      .input('rain.wav')
      .input(ttsFile)
      .complexFilter([
        '[0:a]volume=0.3[a0];[1:a]adelay=1500|1500,volume=1.0[a1];[a0][a1]amix=inputs=2:duration=longest'
      ])
      .outputOptions('-c:a', 'libopus')
      .save(out)
      .on('end', ok)
      .on('error', err);
  });

  await ctx.twilio().calls.create({
    twiml: `${ctx.baseUrl}/static/meditation.wav`,
    to: process.env.MY_PERSONAL_NUMBER,
    from: process.env.TWILIO_FROM_NUMBER
  });

  res.end('zen dialed');
}

I schedule that one every day at 22:00. The call starts with soft rain, my cloned voice guides me through four minutes of breathing, then fades out. Way less jarring than a generic iOS alarm.

Step 8 – Troubleshooting & Common Gotchas

White noise / garbled audio: Your Twilio region defaults to US1. If you’re calling EU numbers, switch to IE1: TWILIO_EDGE=ie1.
TTS delay > 5 s: ElevenLabs’ streaming API only works if you hit /v1/text-to-speech/<id>/stream. OpenClaw plugin pre-generates to a temp file; set ELEVENLABS_STREAM=1 to enable chunked streams (v0.6.0+).
Calls drop at 15 min: Twilio hard caps unknowingly; set TIME_LIMIT param or loop the <Play>.
Voice wake flaky on Intel Macs: Apple restricts the on-device model to AVX2. Use your iPhone as the listener and forward the audio via Continuity.
“Missing fetch” error: You installed the plugin locally inside an older React Native project. Force NO-POLYFILL: export UNDICI_NO_GLOBAL=1.

What to Do Next

You now have an agent that can call, speak in a natural voice, and shut up when you tell it. Whether you wire it into on-call rotations, personal reminders, or nightly meditations, the pattern is the same: POST some text, stream it through ElevenLabs, ship the audio down a phone line. Fork the scripts above, push to your ClawCloud workspace, and let your infrastructure literally speak for itself.