Yes, you can kick off a TypeScript refactor while you are on the train, answer a couple of clarifying questions from the agent, and merge the pull request before you get home. This tutorial shows how I glued OpenClaw (v0.38.2) to OpenAI’s Codex models and now run serious coding tasks from nothing more than my phone’s Telegram client.

What “phone-first development” looks like in practice

The label sounds gimmicky until you have tried it. The flow looks like this:

  • You message your agent: convert the /lib/date utils to dayjs and push a PR.
  • The agent calls Codex to generate code, runs the unit tests locally in a Docker sandbox, opens a branch, commits, and pushes.
  • You get streaming updates back in Telegram: test pass/fail, lint output, link to the live preview, link to the GitHub PR.
  • If the agent gets stuck you can drop into an interactive shell from the phone (via ClawCloud’s web console) to poke around.

No laptop, no SSH client, no public Wi-Fi hijinks. Just the messaging app you already use.

Prerequisites

I am assuming:

  • Node.js 22.4+ (node -v should print at least 22).
  • Docker or Podman installed locally if you want isolated build containers.
  • An OpenAI account with Codex access (I used gpt-4o-preview, the code-centric one).
  • A ClawCloud account (free tier is fine) or a box with a public IPv4 if you self-host.
  • GitHub personal access token with repo and workflow scopes.
  • A phone with WhatsApp, Telegram, or Slack – I will show Telegram because the bot setup is two clicks.

Installing OpenClaw gateway and daemon

If you do not care about infrastructure, skip this and create an agent on ClawCloud – the cloud onboarding hides everything behind a wizard. For on-prem folks, the steps are still one-liner-ish.

1. Bootstrap a new project directory

mkdir ~/openclaw-codex-demo && cd ~/openclaw-codex-demo npm init -y

2. Install the core packages

npm install openclaw@0.38.2 openclaw-daemon@0.38.2 --save

The package split is historical (gateway = UI + HTTP API, daemon = background job runner). They share the same ~/.clawrc config.

3. Generate a default config

npx openclaw init --name "codex-agent" --port 3100

This writes gateway.yaml in the working directory. Key parts:

# gateway.yaml (excerpt) agent: name: codex-agent model: gpt-4-turbo # we will override per-tool later transports: telegram: enabled: true token: "TELEGRAM_BOT_TOKEN"

4. Start the services

# Terminal 1 – gateway (web UI + REST) node ./node_modules/.bin/openclaw --config gateway.yaml # Terminal 2 – daemon (job scheduler) node ./node_modules/.bin/openclaw-daemon --config gateway.yaml

Browse to http://localhost:3100; you should see the claw logo and an empty chat window.

Wiring up OpenAI Codex as a tool

OpenClaw treats outside services as “tools”. Under the hood it is just a JSON schema plus an executor that the LLM can invoke autonomously. The community maintains 800-ish adapters via Composio; Codex is not there by default because of the pricing implications, so we add it manually.

1. Install the helper

npm install openclaw-tool-codex@0.5.1 --save

2. Create a tool manifest

# tools/codex.yaml name: codex schema: input: type: object properties: prompt: type: string temperature: type: number required: [prompt] executor: node:codex-exec.mjs

3. Write the executor

// tools/codex-exec.mjs import OpenAI from "openai"; const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); export default async function({ prompt, temperature = 0 }) { const { choices } = await openai.chat.completions.create({ model: "gpt-4o-preview", messages: [ { role: "system", content: "You are a meticulous software engineer." }, { role: "user", content: prompt } ], temperature }); return choices[0].message.content; }

Drop that directory under ./tools. On restart, OpenClaw discovers it automatically.

4. Expose GitHub actions and shell

Codex alone can write code, but something has to run tests and push branches. I enabled two more existing tools:

# gateway.yaml (excerpt) tools: - ./tools/codex.yaml - composio/github@1.12.0 - builtin/shell

The builtin/shell tool is blacklisted in ClawCloud for obvious reasons unless you toggle “allow shell” in the dashboard. On self-host you own the machine, so knock yourself out – just consider sandboxing.

Connecting the agent to Telegram

1. Create the bot

Talk to @BotFather and run /newbot. Copy the token.

2. Paste token in config

transports: telegram: enabled: true token: "618812345:AAE..."

3. Webhook vs polling

If you are on ClawCloud the platform handles HTTPS certs and webhooks automatically. For localhost I ran ngrok:

ngrok http 3100

Copy the HTTPS URL, then hit:

curl -X POST https://api.telegram.org/botTOKEN/setWebhook \ -d url=https://abcd1234.ngrok-free.app/telegram/webhook

Message /ping to the bot – the gateway should respond pong.

Delegating a real coding task from the phone

Now the fun part. Below is the exact transcript (timestamps removed) from my test project.

You ➜ convert src/utils/date.js to TypeScript, replace Moment.js with Day.js, update tests. Agent ➜ Received. Drafting plan… Agent ➜ Plan: 1. Fork branch date-dayjs 2. Modify files 3. Run yarn test 4. Commit & push 5. Open PR Agent ➜ Executing step 1 (github.createBranch)… Agent ➜ ↳ 201 Created: date-dayjs Agent ➜ Executing step 2 (codex)… Agent ➜ ↳ Codex diff summary: • Added 102 lines • Removed 87 lines Agent ➜ Executing step 3 (shell): yarn test Agent ➜ Tests failing (2/51)… Re-attempting. Agent ➜ ↳ Patched test runner with updated date formats. Agent ➜ Test suite passed. Agent ➜ Executing step 4 (github.commitAndPush)… Agent ➜ Branch pushed: https://github.com/psteiner/demo/pull/17 Agent ➜ PR opened. Anything else?

The entire run took 53 seconds, dominated by yarn install in the first container spin-up. Subsequent edits reuse the layer cache.

Monitoring long-running jobs

For heavier builds – think Go cross-compiles or end-to-end Cypress – you want granular updates instead of a single "done" message. Two options:

1. Streaming logs

Add the --stream flag to the shell tool in your prompt:

run e2e tests with shell --stream

The gateway forwards stdout/stderr chunks as they arrive. Mobile Telegram clients collapse the spam behind a "Load more" so it stays usable.

2. Progress events

The daemon emits Server-Sent Events at /api/events. OpenClaw’s web UI subscribes already, but you can curl it directly:

curl -N http://localhost:3100/api/events | jq -R 'fromjson? | select(.type == "progress")'

I keep that running in a tmux pane on the build server; it is easier to read than Jenkins green/red boxes.

Community workflow: prototype on phone, refine on desktop, merge

This pattern crystallised in the Discord over the last two months:

  1. Kick-off on mobile. You spot a bug while walking. You ask the agent to draft a fix branch.
  2. Agent pushes PR with failing tests. Good enough for later.
  3. Desk time. You pull the branch locally, inspect, maybe run git rebase -i.
  4. Final polish delegated back. Message the agent: rebase onto main and update CHANGELOG.
  5. Merge. Hit the big green button or tell the agent: merge when CI passes.

The sweet spot is prototype speed. Codex is smart but still hallucinates imports. I rarely let it commit to main unattended, but for feature branches it is a time saver.

Hard edges and trade-offs

  • Credentials management. A Telegram bot token + GitHub PAT on the same host is scary. Vault it or use ClawCloud’s Secrets UI (encrypt-at-rest, KMS backed).
  • Latency. Phone > 4G > Telegram > ClawCloud > Codex has measurable hop count. A 30-second RTT is annoying. WhatsApp’s Business API is faster but costs.
  • Determinism. Codex temperature zero is still non-deterministic when tools upstream change versions. Pin container images and keep lockfiles checked in.
  • Cost. GPT-4o-preview is $15/1M input tokens, $60/1M output. Coding tasks are verbose. I hard-capped at $20/day via the OpenAI billing API. The agent respects X-RateLimit-Remaining and refuses tasks once crossed.
  • Security reviews. Letting an LLM run shell is basically curl | bash on steroids. In production we restrict to a Docker container user with no host mounts and only the repo folder bind-mounted.

Next steps: tighten the loop

If you build on this, I recommend:

  • Set up GitHub Checks API so the agent can annotate PRs with inline comments – users are sharing a checks-toolkit adapter in #plugins.
  • Add cron schedules (openclaw schedule add --cron "0 3 * * 1" --task "npm audit fix") so your dependencies stay green while you sleep.
  • Experiment with voice input via WhatsApp audio + Whisper. Typing long prompts on glass is the current UX bottleneck.

Give it a spin. Next weekend, when a teammate pings "the build is red", reply with a single message from your phone and watch the agent do the grindy parts for you.