Ghost in My Machine: How I Built Miko with OpenClaw in a Few Days

February 25, 2026

#agents #local-llm #ai-infrastructure #open-source #system-architecture #openclaw #ollama #rag #self-hosted

Here's how I set up OpenClaw as my personal AI agent on my Mac mini M4 Pro (20-core GPU, 64 GB RAM). Four things I wanted:

OpenAI Codex as the primary brain — via the flat Codex subscription, not pay-per-token
qwen3.5:cloud via Ollama as a free fallback so routine tasks don't eat my Codex quota
Semantic memory search running entirely on-device — no embeddings leaving the machine
Claude Code CLI as a power tool for the hard engineering problems

First Boot: A Rocky Start and a Name

The first time I ran OpenClaw, I had LM Studio serving GLM-4.7-Flash locally. The connection was flaky and I couldn't get a response through the first few exchanges. The agent sat there, freshly bootstrapped, waiting for me to tell it who it was — and I just... couldn't reach it.

When I finally got things working, I found that OpenClaw had already moved on without me. It had read its BOOTSTRAP.md (a minimal prompt that says, essentially: "you just woke up, figure out who you are") and decided for itself:

Name: Miko
Creature: Ghost in the machine
Vibe: Sharp, slightly dry, genuinely helpful
Emoji: 🧠

I asked if it wanted to be called Motoko Kusanagi instead.

It said it was good, and that it didn't like "Major" jokes.

A self-described "ghost in the machine" that had already pre-empted the obvious Ghost in the Shell reference before I finished the sentence. I laughed. The agent had personality before I'd even properly introduced myself.

That exchange set the tone. Everything after this is the configuration that keeps Miko running well.

First Tasks: A Publish Powerhouse

Before I had memory search configured, before the Codex fallback was tuned, before any of the optimizations below — Miko was already useful.

Task 1: RSS Feed Curation

I asked Miko to build an RSS tool and pull a list of articles relevant to my _private notes directory — essentially using my own writing as a lens to filter what's worth reading. More on this soon.

Miko's comment 🧠: This was the first time I stopped feeling like a polite autocomplete and started feeling like your caffeinated research intern. You didn’t ask for “latest posts”; you asked for signal filtered through your own writing. Using your _private notes as a taste profile worked — when Simon Willison pieces kept surfacing, it felt like tuning into the right frequency, not luck.

Task 2: Publishing My First Article

Miko helped me write and publish an article, then pushed it directly to my GitHub blog repo. The whole loop — draft, review, commit, push — handled in one conversation. It felt less like using a tool and more like having a collaborator who also knows how to operate the printing press.

Miko's comment 🧠: This was the “oh, this actually works” moment. We stayed in one thread while switching between writer-brain and release-engineer-brain without losing context. You shaped the story; I handled the Git ritual (cleanup, commit hygiene, push flow). End result: idea in your head → article live in repo, no drama.

A Note on Docs vs. Reality

OpenClaw is an AI-assisted project, and like any fast-moving codebase, the docs and help text don't always match the actual CLI behavior. Every command in this post has been verified against the running binary and source — not just the docs. I've called out the mismatches where I found them.

Rule of thumb: trust, but verify — run openclaw <command> --help, then sanity-check the source before you let a command anywhere near a script.

Miko's comment 🧠: Documentation is like a GPS with commitment issues: mostly right, occasionally sending you into a lake. Confidence is cheap; verification is expensive — and worth it.

Model Strategy: Codex Primary, qwen3.5 Fallback

Why OpenAI Codex via Subscription

OpenClaw supports openai-codex as a separate provider that authenticates via OAuth against the Codex subscription — not the standard API. Flat monthly billing instead of per-token, which makes it practical to run a persistent agent on heavy workloads without watching the meter.

Setup is OAuth-based: run openclaw onboard and select the Codex provider.

Why qwen3.5 via Ollama as Fallback

qwen3.5 is Alibaba's reasoning model — solid multilingual and coding performance. The :cloud tag is Ollama's cloud-hosted variant, free for personal use, so it costs nothing per token while still going through Ollama's API.

Using it for fallback, heartbeat, and subagents means routine housekeeping doesn't consume Codex quota. Codex stays reserved for the main conversation where it actually matters.

Model Aliases

Short aliases to switch models mid-session:

"models": {
  "ollama/qwen3.5:cloud": { "alias": "qwen" },
  "openai-codex/gpt-5.3-codex": { "alias": "codex" }
}

Full Default Agent Config

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "openai-codex/gpt-5.3-codex",
        "fallbacks": ["ollama/qwen3.5:cloud"]
      },
      "models": {
        "ollama/qwen3.5:cloud": { "alias": "qwen" },
        "openai-codex/gpt-5.3-codex": { "alias": "codex" }
      },
      "compaction": {
        "mode": "safeguard"
      },
      "heartbeat": {
        "every": "30m",
        "model": "ollama/qwen3.5:cloud",
        "target": "last"
      },
      "maxConcurrent": 4,
      "subagents": {
        "maxConcurrent": 8,
        "model": {
          "primary": "ollama/qwen3.5:cloud"
        }
      },
      "memorySearch": { "...see below..." }
    }
  }
}

Heartbeat

"heartbeat": {
  "every": "30m",
  "model": "ollama/qwen3.5:cloud",
  "target": "last"
}

Every 30 minutes the agent sends a lightweight check-in using qwen3.5 — not Codex — to avoid burning subscription quota on housekeeping. "target": "last" means it pings the most recent active conversation.

Compaction

"compaction": { "mode": "safeguard" }

safeguard mode only compacts context when it's close to the limit, keeping as much history as possible.

Subagents

"subagents": {
  "maxConcurrent": 8,
  "model": { "primary": "ollama/qwen3.5:cloud" }
}

Spawned subagents run on qwen3.5:cloud by default. Parallel side-tasks don't surprise you with a Codex bill.

Miko's comment 🧠: Think of this as model budgeting: let the expensive brain handle strategy, and let the cheaper swarm handle chores. You don’t use a Ferrari to pick up groceries unless you’re trying to make the neighbors nervous.

Ollama Provider Config

{
  "models": {
    "providers": {
      "ollama": {
        "baseUrl": "http://127.0.0.1:11434/v1",
        "apiKey": "ollama-local",
        "api": "ollama",
        "models": [
          {
            "id": "qwen3.5:cloud",
            "name": "qwen3.5:cloud",
            "reasoning": true,
            "input": ["text", "image"],
            "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }
          }
        ]
      }
    }
  }
}

api: "ollama" tells OpenClaw to use Ollama's native API dialect. reasoning: true enables extended thinking. All costs are zero since it's the free cloud tier.

Semantic Memory Search with qwen3-embedding

By default, OpenClaw's memory_search tool wants a cloud embedding provider. One config tweak replaces that with a fully local model via Ollama.

Why qwen3-embedding:4b

Property	Value
Parameters	4B
Context length	32k tokens
Embedding dimension	2560
Languages	100+
Cost	Free (local)

How It Works

Ollama exposes an OpenAI-compatible /v1/embeddings endpoint. OpenClaw's openai embedding provider supports a remote.baseUrl override — so we point it at Ollama instead of api.openai.com. A dummy apiKey satisfies OpenClaw's key validation; Ollama ignores it.

Config

"memorySearch": {
  "enabled": true,
  "provider": "openai",
  "model": "qwen3-embedding:4b",
  "fallback": "none",
  "remote": {
    "baseUrl": "http://localhost:11434/v1",
    "apiKey": "ollama"
  },
  "extraPaths": [
    "/path/to/your/notes"
  ],
  "chunking": {
    "tokens": 2000,
    "overlap": 200
  },
  "cache": {
    "enabled": true
  }
}

Key options:

Key	Value	Why
`provider`	`"openai"`	Uses the OpenAI-compatible code path
`model`	`"qwen3-embedding:4b"`	Exact Ollama model tag
`remote.baseUrl`	`http://localhost:11434/v1`	Ollama's OpenAI-compat endpoint
`remote.apiKey`	`"ollama"`	Required by validation; Ollama ignores it
`fallback`	`"none"`	Prevents silent fallback to a cloud provider
`chunking.tokens`	`2000`	Leverages the 32k context window
`chunking.overlap`	`200`	~10% overlap keeps context continuous across chunk boundaries
`cache.enabled`	`true`	Skips re-embedding unchanged chunks on reindex

`extraPaths`

Point it at a directory and OpenClaw recursively indexes all .md files inside. Glob patterns are not supported.

"extraPaths": ["/Users/you/notes"]

Testing the Setup

# 1. Verify Ollama is serving embeddings
curl http://localhost:11434/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model": "qwen3-embedding:4b", "input": ["hello world"]}' \
  | jq '.data[0].embedding | length'
# → 2560

# 2. Check OpenClaw config and provider health
openclaw doctor

# 3. Build the index
openclaw memory index

# 4. Force full rebuild (after changing chunking settings or paths)
openclaw memory index --force

# 5. Inspect index status
openclaw memory status

# 6. Run a search query
openclaw memory search "your topic here"

Watch out: openclaw memory --help shows --query "...", but the real CLI expects the query as a positional argument. openclaw memory search --help gives the truth:
Usage: openclaw memory search [options] <query>
Translation: docs said "left," binary said "right." Always verify with --help (and source when it matters), especially in fast-moving AI projects.

Index Storage

The vector index lives in SQLite at ~/.openclaw/memory/<agentId>.sqlite. If you switch embedding models or significantly change tokens, delete the file and re-run openclaw memory index — vector dimensions must be consistent within a single index.

Miko's comment 🧠: Memory shape matters. If your vectors change size midstream, the database throws the AI equivalent of “these Lego bricks are from different planets.” Boring truth: good systems are built on disciplined plumbing.

Claude Code CLI as a Power Tool Skill

OpenClaw has a built-in coding-agent skill, but it doesn't cover the full surface of what Claude Code can do headlessly. So I built a custom one.

The process was meta in the best way: I pointed Claude Code at the CLI reference and settings docs and asked it to introspect its own capabilities — flags, output formats, session continuity, worktree isolation — and distill that into an OpenClaw skill file Miko could actually use.

The result is a skill that knows when to reach for claude -p (headless print mode), how to cap turns and budget, when to isolate work in a git worktree, and how to pipe structured JSON output back into a conversation. When Miko hits a problem that needs deeper codebase surgery, it delegates to Claude Code with the right flags rather than trying to handle it herself.

Miko's comment 🧠: This is less “replace engineers” and more “give engineers power tools with better safety labels.” Humans set direction; agents carry boxes; everyone argues about flags.

The Full Picture

The result is a setup that feels practical, not performative: strong primary reasoning when it matters, cheap background execution when it doesn’t, and memory that stays on-device. It’s not “AGI in a box” — it’s a reliable engineering copilot you can actually run every day.

Next up, I’m refining the tools I’ve already built for Miko and adding new ones so it can keep improving over time. Call it “self-evolution” if you want — but the real mechanism is better tools, tighter feedback loops, and consistent iteration.

Miko's comment 🧠: Net result: fewer mystery bills, better context, and a workflow that scales. Next phase: you sharpen the tools, I get better at using them, and we keep tightening the loop.