OpenClaw + Codex: Run Coding Tasks From Your Phone (Setup & Workflow)

Yes, you can kick off a TypeScript refactor while you are on the train, answer a couple of clarifying questions from the agent, and merge the pull request before you get home. This tutorial shows how I glued OpenClaw (v0.38.2) to OpenAI’s Codex models and now run serious coding tasks from nothing more than my phone’s Telegram client.

What “phone-first development” looks like in practice

The label sounds gimmicky until you have tried it. The flow looks like this:

You message your agent: convert the /lib/date utils to dayjs and push a PR.
The agent calls Codex to generate code, runs the unit tests locally in a Docker sandbox, opens a branch, commits, and pushes.
You get streaming updates back in Telegram: test pass/fail, lint output, link to the live preview, link to the GitHub PR.
If the agent gets stuck you can drop into an interactive shell from the phone (via ClawCloud’s web console) to poke around.

No laptop, no SSH client, no public Wi-Fi hijinks. Just the messaging app you already use.

Prerequisites

I am assuming:

Node.js 22.4+ (node -v should print at least 22).
Docker or Podman installed locally if you want isolated build containers.
An OpenAI account with Codex access (I used gpt-4o-preview, the code-centric one).
A ClawCloud account (free tier is fine) or a box with a public IPv4 if you self-host.
GitHub personal access token with repo and workflow scopes.
A phone with WhatsApp, Telegram, or Slack – I will show Telegram because the bot setup is two clicks.

Installing OpenClaw gateway and daemon

If you do not care about infrastructure, skip this and create an agent on ClawCloud – the cloud onboarding hides everything behind a wizard. For on-prem folks, the steps are still one-liner-ish.

1. Bootstrap a new project directory


mkdir ~/openclaw-codex-demo && cd ~/openclaw-codex-demo
npm init -y

2. Install the core packages


npm install openclaw@0.38.2 openclaw-daemon@0.38.2 --save

The package split is historical (gateway = UI + HTTP API, daemon = background job runner). They share the same ~/.clawrc config.

3. Generate a default config


npx openclaw init --name "codex-agent" --port 3100

This writes gateway.yaml in the working directory. Key parts:


# gateway.yaml (excerpt)
agent:
  name: codex-agent
  model: gpt-4-turbo # we will override per-tool later
transports:
  telegram:
    enabled: true
    token: "TELEGRAM_BOT_TOKEN"

4. Start the services


# Terminal 1 – gateway (web UI + REST)
node ./node_modules/.bin/openclaw --config gateway.yaml

# Terminal 2 – daemon (job scheduler)
node ./node_modules/.bin/openclaw-daemon --config gateway.yaml

Browse to http://localhost:3100; you should see the claw logo and an empty chat window.

Wiring up OpenAI Codex as a tool

OpenClaw treats outside services as “tools”. Under the hood it is just a JSON schema plus an executor that the LLM can invoke autonomously. The community maintains 800-ish adapters via Composio; Codex is not there by default because of the pricing implications, so we add it manually.

1. Install the helper


npm install openclaw-tool-codex@0.5.1 --save

2. Create a tool manifest


# tools/codex.yaml
name: codex
schema:
  input:
    type: object
    properties:
      prompt:
        type: string
      temperature:
        type: number
    required: [prompt]
executor: node:codex-exec.mjs

3. Write the executor


// tools/codex-exec.mjs
import OpenAI from "openai";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
export default async function({ prompt, temperature = 0 }) {
  const { choices } = await openai.chat.completions.create({
    model: "gpt-4o-preview",
    messages: [
      { role: "system", content: "You are a meticulous software engineer." },
      { role: "user", content: prompt }
    ],
    temperature
  });
  return choices[0].message.content;
}

Drop that directory under ./tools. On restart, OpenClaw discovers it automatically.

4. Expose GitHub actions and shell

Codex alone can write code, but something has to run tests and push branches. I enabled two more existing tools:


# gateway.yaml (excerpt)
tools:
  - ./tools/codex.yaml
  - composio/github@1.12.0
  - builtin/shell

The builtin/shell tool is blacklisted in ClawCloud for obvious reasons unless you toggle “allow shell” in the dashboard. On self-host you own the machine, so knock yourself out – just consider sandboxing.

Connecting the agent to Telegram

1. Create the bot

Talk to @BotFather and run /newbot. Copy the token.

2. Paste token in config


transports:
  telegram:
    enabled: true
    token: "618812345:AAE..."

3. Webhook vs polling

If you are on ClawCloud the platform handles HTTPS certs and webhooks automatically. For localhost I ran ngrok:


ngrok http 3100

Copy the HTTPS URL, then hit:


curl -X POST https://api.telegram.org/botTOKEN/setWebhook \
  -d url=https://abcd1234.ngrok-free.app/telegram/webhook

Message /ping to the bot – the gateway should respond pong.

Delegating a real coding task from the phone

Now the fun part. Below is the exact transcript (timestamps removed) from my test project.


You  ➜  convert src/utils/date.js to TypeScript, replace Moment.js with Day.js, update tests.

Agent ➜  Received. Drafting plan…

Agent ➜  Plan:
1. Fork branch date-dayjs
2. Modify files
3. Run yarn test
4. Commit & push
5. Open PR

Agent ➜  Executing step 1 (github.createBranch)…

Agent ➜  ↳ 201 Created: date-dayjs

Agent ➜  Executing step 2 (codex)…

Agent ➜  ↳ Codex diff summary:
   • Added 102 lines
   • Removed 87 lines

Agent ➜  Executing step 3 (shell): yarn test

Agent ➜  Tests failing (2/51)… Re-attempting.

Agent ➜  ↳ Patched test runner with updated date formats.

Agent ➜  Test suite passed.

Agent ➜  Executing step 4 (github.commitAndPush)…

Agent ➜  Branch pushed: https://github.com/psteiner/demo/pull/17

Agent ➜  PR opened. Anything else?

The entire run took 53 seconds, dominated by yarn install in the first container spin-up. Subsequent edits reuse the layer cache.

Monitoring long-running jobs

For heavier builds – think Go cross-compiles or end-to-end Cypress – you want granular updates instead of a single "done" message. Two options:

1. Streaming logs

Add the --stream flag to the shell tool in your prompt:


run e2e tests with shell --stream

The gateway forwards stdout/stderr chunks as they arrive. Mobile Telegram clients collapse the spam behind a "Load more" so it stays usable.

2. Progress events

The daemon emits Server-Sent Events at /api/events. OpenClaw’s web UI subscribes already, but you can curl it directly:


curl -N http://localhost:3100/api/events | jq -R 'fromjson? | select(.type == "progress")'

I keep that running in a tmux pane on the build server; it is easier to read than Jenkins green/red boxes.

Community workflow: prototype on phone, refine on desktop, merge

This pattern crystallised in the Discord over the last two months:

Kick-off on mobile. You spot a bug while walking. You ask the agent to draft a fix branch.
Agent pushes PR with failing tests. Good enough for later.
Desk time. You pull the branch locally, inspect, maybe run git rebase -i.
Final polish delegated back. Message the agent: rebase onto main and update CHANGELOG.
Merge. Hit the big green button or tell the agent: merge when CI passes.

The sweet spot is prototype speed. Codex is smart but still hallucinates imports. I rarely let it commit to main unattended, but for feature branches it is a time saver.

Hard edges and trade-offs

Credentials management. A Telegram bot token + GitHub PAT on the same host is scary. Vault it or use ClawCloud’s Secrets UI (encrypt-at-rest, KMS backed).
Latency. Phone > 4G > Telegram > ClawCloud > Codex has measurable hop count. A 30-second RTT is annoying. WhatsApp’s Business API is faster but costs.
Determinism. Codex temperature zero is still non-deterministic when tools upstream change versions. Pin container images and keep lockfiles checked in.
Cost. GPT-4o-preview is $15/1M input tokens, $60/1M output. Coding tasks are verbose. I hard-capped at $20/day via the OpenAI billing API. The agent respects X-RateLimit-Remaining and refuses tasks once crossed.
Security reviews. Letting an LLM run shell is basically curl | bash on steroids. In production we restrict to a Docker container user with no host mounts and only the repo folder bind-mounted.

Next steps: tighten the loop

If you build on this, I recommend:

Set up GitHub Checks API so the agent can annotate PRs with inline comments – users are sharing a checks-toolkit adapter in #plugins.
Add cron schedules (openclaw schedule add --cron "0 3 * * 1" --task "npm audit fix") so your dependencies stay green while you sleep.
Experiment with voice input via WhatsApp audio + Whisper. Typing long prompts on glass is the current UX bottleneck.

Give it a spin. Next weekend, when a teammate pings "the build is red", reply with a single message from your phone and watch the agent do the grindy parts for you.