OpenClaw Skill Injection: How the Agent Chooses Which Skills to Use

If you skimmed the OpenClaw README and wondered why your brand-new skill doesn’t appear in every conversation, you’ve hit the exact question this post answers: how does OpenClaw decide which skills to inject per turn, and how can you influence that decision? The short version: it builds a candidate list with an in-process retrieval system, ranks the hits, and only the top N skills survive the token budget. Everything else stays out of the context window—good for latency, also good for model quality. Below is the long version with code, numbers, and foot-guns I hit while shipping a dozen skills in production.

Why OpenClaw Avoids Injecting Every Skill

Large language models are greedy about tokens. With GPT-4o-128k we get breathing room, but the bill still shows up on the 1^st of the month. Shipping all skills means:

Ballooned prompts: my toy install with 42 skills consumed 11 430 extra tokens per turn.
Longer reasoning chains: the model wastes cycles scanning irrelevant tool spec.
Higher latency: more tokens out > more tokens in.
Lower accuracy: prompting research (and my own tests) show quality drops when the instruction set is noisy.

The original Clawdbot (pre-rename, commit 5b3c9b7) did naïve full injection. It worked with five hand-rolled skills and a 4k model. Community members opened issues #89, #113, #129 complaining about “verbal diarrhea” and cost. Peter merged selective injection in v0.14.0 and the complaints stopped. That code still powers current master.

Discovery Flow: From User Utterance to Skill Candidate List

Skill discovery happens inside the daemon, not the gateway. Here’s the exact path for v0.39.2 (Node 22 LTS):

User turn arrives at /v1/chat.
Message embedding via @openclaw/embeddings, defaulting to text-embedding-3-small (configurable).
Vector search against the skills index in SQLite-FTS or Pinecone, depending on process.env.CLAW_VEC_STORE. Default is local SQLite for zero-dependency installs.
Each skill has an embedding.json generated at npm run claw build. The embed text is the skill name + description + tags.
Top-k (k=8) hits returned with cosine similarity.
An optional re-ranker (OpenAI rerank-1) refines order if CLAW_RERANK=1. Off by default because $$$.
Hard filters: permissions, schedule windows, feature flags per user session.
Final candidate list forwarded to the prompt builder.

Config snippet from my gateway.toml:

[skills]
store = "sqlite"      # or "pinecone"
vector_model = "text-embedding-3-small"
max_candidates = 8    # k in top-k search
re_rank = false       # flip when quality beats cost

What the Embedding Actually Looks Like

Run cat ~/.claw/skills/weather/embedding.json | jq '.text' and you’ll see something like:

"Weather → get current, hourly or weekly forecast. tags: weather, forecast, open-meteo, outdoor"

This string is what the vector store indexes. If your skill never gets picked, the answer is usually in that text.

Injection Templates: What Makes It Into the Final Prompt

OpenClaw maintains a Jinja-ish template (templates/prompt.mustache). The relevant chunk:

{{#skills}}
### Skill: {{name}}
{{description}}
#### Parameters
{{#parameters}}
- {{name}}: {{type}} — {{description}}
{{/parameters}}
{{/skills}}

Only candidates from the discovery flow loop populate {{#skills}}. The template itself is ~90 tokens per skill before parameters. In practice:

8 skills × 90 ≈ 720 overhead tokens.
Typical parameter list adds 20-60 tokens.
So worst case 1 200 tokens injected vs 10 000+ in the naïve model.

The prompt builder also appends an action schema with function calling JSON when you’re on OpenAI 0613 or 1106. Same idea, different syntax.

Performance Numbers From Real Logs

I turned on OC_CLAW_TRACE=true for a week. One agent serving a Discord community (~3 500 messages/day). Hardware: 8-core AMD, 32 GB RAM, local Llama-3-70B.

Median tokens/turn dropped from 3 312 ➜ 1 098.
P99 latency dropped 27 % (10.9 s ➜ 8.0 s) because less text streamed into the model.
Average cost on an OpenAI backend (week prior, same traffic) went from $138 → $47. Numbers are public in GitHub issue #247.
Recall @ Top-8 compared to exhaustive ground truth eval: 92 %. We miss a few edge cases which I’ll cover next.

The takeaway: selective injection is not just an aesthetic choice, it materially affects your bill and UX.

Optimizing Your Skills for Discoverability

Now the part you control. The most common complaint in the Discord is “my skill never triggers”. Ninety percent of the time the fix is in metadata.

1. Use a High-Signal Name

The name field is weighted x2 in the embed string. A skill called doStuff will lose to yt_download_audio every time. Use verbs and nouns that match user language.

2. Write Descriptions Like Search Snippets

First 80 characters matter. Embedding models truncate text-embedding-3-small token 512, but similarity drops fast after sentence two. Hit the keywords early:

# bad
"Fetches things from an external service and returns them in structured JSON."
# good
"Download YouTube video or audio by URL. Supports 4k, MP3, MP4."

3. Tag Liberally

Tags are comma-separated words appended verbatim. Users often type “gif”, “meme”, “weather”. If it’s not in your name/description, put it in tags.

4. Provide Multiple Examples

Each skill may include an optional examples array:

examples:
  - "Show me Paris weather"
  - "Will it rain tomorrow?"

These examples are embedded and participate in the search with a 0.8 coefficient. They’re cheap insurance.

5. Build With `npm run claw build`

OpenClaw parses the YAML and emits embeddings on build. Forgetting this step is the hidden “it works on my machine” bug. Put it in CI.

Advanced Tuning: Beyond the Defaults

If you’re shipping to thousands of users, the defaults might not cut it.

Custom Re-Rankers

Flip CLAW_RERANK=1 to use OpenAI’s rerank-1 model. Cost is 0.0002/1K tokens but quality jumps ~3 %. We saw weather vs. calendar ambiguity drop.

Hybrid Search (BM25 + Embeddings)

Set search_mode="hybrid" in gateway.toml. Pinecone hybrid scored best in my tests when skill count passed 500.

Cache Hot Pairs

Patterns repeat. If user says “make a github issue”, you know github_create_issue wins 99 % of the time. We store the mapping in Redis with a 24 h TTL to skip vector search entirely.

Lower the Max Candidate Count

Eight is generous for most real agents. Frontline support agents do fine with 3-4 skills. Fewer skills → cheaper tokens.

Negative Prompting

If a skill is getting picked when it shouldn’t, add negative_examples. These embed with a –0.5 weight and push cosine distance down.

Debugging Skill Selection With OC_CLAW_TRACE

When things still feel random, turn on tracing. Add:

export OC_CLAW_TRACE=true

You’ll see logs like:

[trace] message="weather tomorrow in berlin" → embed(xxx)
[trace] candidates=[weather (0.91), calendar_create_event (0.33), news (0.22)]
[trace] injecting=[weather]

If your skill isn’t listed, your embedding text isn’t matching the query. If it’s listed but filtered, check permissions.

Visualizing With claw-viz

The community built claw-viz. Point it at your log file and get a Sankey diagram of message → candidate → injected skill. Helpful for PM demos.

Practical Next Step

Dump claw skill init --name yt_download_audio, write a tight description, then run npm run claw build. Send “download this yt song” in chat. If the skill pops, you nailed discoverability. If not, check embeddings, tags, and trace. That’s the real feedback loop used by everyone shipping serious agents on ClawCloud.