Yes, you can kick off a TypeScript refactor while you are on the train, answer a couple of clarifying questions from the agent, and merge the pull request before you get home. This tutorial shows how I glued OpenClaw (v0.38.2) to OpenAI’s Codex models and now run serious coding tasks from nothing more than my phone’s Telegram client.
What “phone-first development” looks like in practice
The label sounds gimmicky until you have tried it. The flow looks like this:
- You message your agent:
convert the /lib/date utils to dayjs and push a PR. - The agent calls Codex to generate code, runs the unit tests locally in a Docker sandbox, opens a branch, commits, and pushes.
- You get streaming updates back in Telegram: test pass/fail, lint output, link to the live preview, link to the GitHub PR.
- If the agent gets stuck you can drop into an interactive shell from the phone (via ClawCloud’s web console) to poke around.
No laptop, no SSH client, no public Wi-Fi hijinks. Just the messaging app you already use.
Prerequisites
I am assuming:
- Node.js 22.4+ (
node -vshould print at least 22). - Docker or Podman installed locally if you want isolated build containers.
- An OpenAI account with Codex access (I used
gpt-4o-preview, the code-centric one). - A ClawCloud account (free tier is fine) or a box with a public IPv4 if you self-host.
- GitHub personal access token with
repoandworkflowscopes. - A phone with WhatsApp, Telegram, or Slack – I will show Telegram because the bot setup is two clicks.
Installing OpenClaw gateway and daemon
If you do not care about infrastructure, skip this and create an agent on ClawCloud – the cloud onboarding hides everything behind a wizard. For on-prem folks, the steps are still one-liner-ish.
1. Bootstrap a new project directory
mkdir ~/openclaw-codex-demo && cd ~/openclaw-codex-demo
npm init -y
2. Install the core packages
npm install openclaw@0.38.2 openclaw-daemon@0.38.2 --save
The package split is historical (gateway = UI + HTTP API, daemon = background job runner). They share the same ~/.clawrc config.
3. Generate a default config
npx openclaw init --name "codex-agent" --port 3100
This writes gateway.yaml in the working directory. Key parts:
# gateway.yaml (excerpt)
agent:
name: codex-agent
model: gpt-4-turbo # we will override per-tool later
transports:
telegram:
enabled: true
token: "TELEGRAM_BOT_TOKEN"
4. Start the services
# Terminal 1 – gateway (web UI + REST)
node ./node_modules/.bin/openclaw --config gateway.yaml
# Terminal 2 – daemon (job scheduler)
node ./node_modules/.bin/openclaw-daemon --config gateway.yaml
Browse to http://localhost:3100; you should see the claw logo and an empty chat window.
Wiring up OpenAI Codex as a tool
OpenClaw treats outside services as “tools”. Under the hood it is just a JSON schema plus an executor that the LLM can invoke autonomously. The community maintains 800-ish adapters via Composio; Codex is not there by default because of the pricing implications, so we add it manually.
1. Install the helper
npm install openclaw-tool-codex@0.5.1 --save
2. Create a tool manifest
# tools/codex.yaml
name: codex
schema:
input:
type: object
properties:
prompt:
type: string
temperature:
type: number
required: [prompt]
executor: node:codex-exec.mjs
3. Write the executor
// tools/codex-exec.mjs
import OpenAI from "openai";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
export default async function({ prompt, temperature = 0 }) {
const { choices } = await openai.chat.completions.create({
model: "gpt-4o-preview",
messages: [
{ role: "system", content: "You are a meticulous software engineer." },
{ role: "user", content: prompt }
],
temperature
});
return choices[0].message.content;
}
Drop that directory under ./tools. On restart, OpenClaw discovers it automatically.
4. Expose GitHub actions and shell
Codex alone can write code, but something has to run tests and push branches. I enabled two more existing tools:
# gateway.yaml (excerpt)
tools:
- ./tools/codex.yaml
- composio/github@1.12.0
- builtin/shell
The builtin/shell tool is blacklisted in ClawCloud for obvious reasons unless you toggle “allow shell” in the dashboard. On self-host you own the machine, so knock yourself out – just consider sandboxing.
Connecting the agent to Telegram
1. Create the bot
Talk to @BotFather and run /newbot. Copy the token.
2. Paste token in config
transports:
telegram:
enabled: true
token: "618812345:AAE..."
3. Webhook vs polling
If you are on ClawCloud the platform handles HTTPS certs and webhooks automatically. For localhost I ran ngrok:
ngrok http 3100
Copy the HTTPS URL, then hit:
curl -X POST https://api.telegram.org/botTOKEN/setWebhook \
-d url=https://abcd1234.ngrok-free.app/telegram/webhook
Message /ping to the bot – the gateway should respond pong.
Delegating a real coding task from the phone
Now the fun part. Below is the exact transcript (timestamps removed) from my test project.
You ➜ convert src/utils/date.js to TypeScript, replace Moment.js with Day.js, update tests.
Agent ➜ Received. Drafting plan…
Agent ➜ Plan:
1. Fork branch date-dayjs
2. Modify files
3. Run yarn test
4. Commit & push
5. Open PR
Agent ➜ Executing step 1 (github.createBranch)…
Agent ➜ ↳ 201 Created: date-dayjs
Agent ➜ Executing step 2 (codex)…
Agent ➜ ↳ Codex diff summary:
• Added 102 lines
• Removed 87 lines
Agent ➜ Executing step 3 (shell): yarn test
Agent ➜ Tests failing (2/51)… Re-attempting.
Agent ➜ ↳ Patched test runner with updated date formats.
Agent ➜ Test suite passed.
Agent ➜ Executing step 4 (github.commitAndPush)…
Agent ➜ Branch pushed: https://github.com/psteiner/demo/pull/17
Agent ➜ PR opened. Anything else?
The entire run took 53 seconds, dominated by yarn install in the first container spin-up. Subsequent edits reuse the layer cache.
Monitoring long-running jobs
For heavier builds – think Go cross-compiles or end-to-end Cypress – you want granular updates instead of a single "done" message. Two options:
1. Streaming logs
Add the --stream flag to the shell tool in your prompt:
run e2e tests with shell --stream
The gateway forwards stdout/stderr chunks as they arrive. Mobile Telegram clients collapse the spam behind a "Load more" so it stays usable.
2. Progress events
The daemon emits Server-Sent Events at /api/events. OpenClaw’s web UI subscribes already, but you can curl it directly:
curl -N http://localhost:3100/api/events | jq -R 'fromjson? | select(.type == "progress")'
I keep that running in a tmux pane on the build server; it is easier to read than Jenkins green/red boxes.
Community workflow: prototype on phone, refine on desktop, merge
This pattern crystallised in the Discord over the last two months:
- Kick-off on mobile. You spot a bug while walking. You ask the agent to draft a fix branch.
- Agent pushes PR with failing tests. Good enough for later.
- Desk time. You pull the branch locally, inspect, maybe run
git rebase -i. - Final polish delegated back. Message the agent:
rebase onto main and update CHANGELOG. - Merge. Hit the big green button or tell the agent:
merge when CI passes.
The sweet spot is prototype speed. Codex is smart but still hallucinates imports. I rarely let it commit to main unattended, but for feature branches it is a time saver.
Hard edges and trade-offs
- Credentials management. A Telegram bot token + GitHub PAT on the same host is scary. Vault it or use ClawCloud’s Secrets UI (encrypt-at-rest, KMS backed).
- Latency. Phone > 4G > Telegram > ClawCloud > Codex has measurable hop count. A 30-second RTT is annoying. WhatsApp’s Business API is faster but costs.
- Determinism. Codex temperature zero is still non-deterministic when tools upstream change versions. Pin container images and keep lockfiles checked in.
- Cost. GPT-4o-preview is $15/1M input tokens, $60/1M output. Coding tasks are verbose. I hard-capped at $20/day via the OpenAI billing API. The agent respects
X-RateLimit-Remainingand refuses tasks once crossed. - Security reviews. Letting an LLM run
shellis basicallycurl | bashon steroids. In production we restrict to a Docker container user with no host mounts and only the repo folder bind-mounted.
Next steps: tighten the loop
If you build on this, I recommend:
- Set up GitHub Checks API so the agent can annotate PRs with inline comments – users are sharing a
checks-toolkitadapter in #plugins. - Add cron schedules (
openclaw schedule add --cron "0 3 * * 1" --task "npm audit fix") so your dependencies stay green while you sleep. - Experiment with voice input via WhatsApp audio + Whisper. Typing long prompts on glass is the current UX bottleneck.
Give it a spin. Next weekend, when a teammate pings "the build is red", reply with a single message from your phone and watch the agent do the grindy parts for you.