Change notes · decision matrix · eight-step runbook · ticket lines · VNC console
OpenClaw v2026.4.25 moves plugin startup onto a cold persisted registry: at launch the agent reads curated metadata instead of walking the full extensions tree, which makes install and update behavior more predictable but introduces new failure modes. Operators see empty-looking plugin panes, long first cold starts after upgrade, and drift between npm global build IDs and the Gateway UI. Release notes also call out install and update hardening plus mixed-version gateway verification—meaning the CLI, the Gateway process, and the package path in your launchd plist must tell the same version story. This article is the orthogonal companion to the v2026.4.25 channel pairing and Gateway safety guide: it focuses on plugin-plane stability while you keep QR and Mission Control out of the critical path. Read it together with staged release operations, the official Docker Compose walkthrough, the v2026.4.5 breaking upgrade runbook, and the common errors guide so you can separate “the IM channel is green” from “the extension registry is healthy.”
The phrases “cold registry,” “metadata repair,” and “mixed version verification” describe measurable behavior. First, startup is a cold read. If the on-disk index is half-migrated, you will not get a clean error line that says “registry corrupt”; you will see a blank plugin screen or a multi-minute stall while Node rebuilds the index, especially on a remote Mac with a small SSD and co-tenanted I/O. Second, install paths now lean on local install metadata that must survive partial npm downloads. A tarball that stops mid-flight can leave a plugin row with a version label yet no runnable binary—exactly the class of bug that openclaw plugins repair and related reindex commands are meant to address. Third, mixed versions are a runtime topology problem. The CLI in your shell may resolve to /opt/homebrew/... while launchd still points ProgramArguments at an older dist/index.js under a previous prefix, or a Docker container mounts a different config root from the host where you just ran a repair. Until those paths converge, the console at port 18789 is reading a different build than the one you think you installed.
The five items below are the ones that show up in tickets when teams skip evidence collection and jump to “reinstall everything.” You can copy them into a change request as the hidden cost section.
Cold start spikes: the first full index or repair pass can pin CPU and the Node event loop; on a 32 GB remote instance with a noisy neighbor disk, the session looks hung even though the process is still rewriting shard files.
Half-installed plugins: a dropped network while fetching a bundle can create manifest rows that reference missing native modules, which surfaces in the UI as “version known, enable switch does nothing” until repair reconciles the tree.
Multiple install roots: Homebrew /opt/homebrew, legacy /usr/local, and a checked-in node_modules tree in a developer checkout can all host an OpenClaw build. The launchd job might still be bound to the oldest path, which triggers mixed-version warnings in logs while your interactive shell is already on 4.25.
Permission and sandboxing: a helper that cannot create files under the registry cache directory often logs a generic “plugin load failed” instead of a crisp EPERM, so you still need the same TCC and ownership pass you would do for any long-lived agent on macOS. Pair this with the remote Mac TCC checklist if your symptoms involve automation touching UI surfaces.
Channel health is not plugin health. A successful IM QR path proves delivery for one connector; it does not prove the registry and Gateway agree. Sign those acceptance lines on separate work items so you do not close a “production ready” ticket while extensions silently fail.
A cold registry rewards repeatable install surfaces. Docker can pin the Node engine and the openclaw package in one image layer, but bind mounts to the host can reintroduce mixed roots if you repair on the host and let the container read a stale volume. Bare-metal npm install -g is quick to roll forward yet demands discipline: every repair, doctor run, and plist edit must use the same macOS user that will eventually show the 18789 UI, otherwise the registry on disk and the per-user cache diverge. Use the table as an attachment to a change record; it is also the fastest way to prove an incident was a volume mismatch, not a model outage.
| Dimension | Bare-metal npm or pnpm | Docker Compose | Common misread |
|---|---|---|---|
| Version alignment | Depends on launchd, PATH, and your shell init files | Depends on the image tag and the mounted config directory | Only checking openclaw --version in an SSH session, never matching the live Gateway argv |
| Repair scenario | Operates on the host registry in place | Must run the equivalent command inside the container on the same volume | Repairing on the host while the container still points at a frozen path inside the image |
| Rollback speed | Restore a known tarball and lock the semver fast | Retag the image and docker compose up | Upgrading both host and image without a snapshot, then debugging two bad layers at once |
| Observability | macOS log stream plus local browser to 127.0.0.1:18789 | Add docker logs and container-local curl | SSH tail only, never opening DevTools to compare bundle hashes |
Rule: the process that launches Gateway defines the product version. The CLI is just another client.
The order is strict on purpose: evidence before mutation. If you run hot traffic, pre-announce a freeze window and borrow language from the freeze and exception section of the operations guide. Capture disk free space; cold registry work is sensitive to anything under roughly ten to fifteen percent free on APFS. After you change packages, always thread openclaw doctor first so support gets something searchable before you touch long-running subcommands. When the release note mentions a repair verb, use the one your org standardized—reindex, repair, or both—and keep full stderr, because partial success still leaves mixed metadata behind.
Only restart the Gateway after you are confident the CLI, plist, and container entrypoint agree. Restarting an old process against a new registry is how teams create confusing lock errors that look like corruption. The HTTP surface should expose a consistent build string: compare /version or the banner in logs, the About pane in the UI, and the first line of openclaw --version. If any of the three differ, you are not done.
Back up: export the OpenClaw config root, the plugin data directory, and the launchd plist. Record openclaw --version and the full Gateway argv from ps before the bump.
Freeze writes: during the window, pause automatic skill update jobs and ad hoc plugin installs from other maintainers so you do not interleave two migrations.
Bump the package: move the supported channel to v2026.4.25. For global npm, confirm which openclaw and the launchd ProgramArguments reference the same prefix.
Run doctor: paste any plugin, registry, or gateway lines verbatim into the ticket, including timing warnings.
Execute plugins repair or reindex: watch wall-clock time and disk writes; on failure, keep complete stderr, not a screenshot of the last three lines.
Restart Gateway: only when versions align, so the old process does not hold a stale lock on the new layout.
Mixed-version verification: compare CLI, HTTP version endpoint, and the About or footer area in the UI.
VNC acceptance: in a graphical session, open http://127.0.0.1:18789 (or your documented tunnel), validate the extension list, model authorization cards, and the error drawer. Attach a screenshot to the change.
mixed-version triage in three checks: 1) CLI: openclaw --version 2) Process: ps — note dist/index.js path in argv 3) UI: footer or About, plus response headers in DevTools for the gateway route
Note: for Docker, document where repair and doctor run (host vs docker compose exec) in the runbook, and make it match the volume layout in the compose guide so a well-meaning SSH session does not desync a container.
Caution: do not jump major versions, rewrite SecretRef, and reconfigure channel routing in one weekend without backups. If that collides with the safety ordering in the v2026.4.5 breaking change article, take the stricter path.
SSH remains the right tool for doctor, repair, and structured logs. Control UI, browser extension consent, and whether a toggle actually persisted still need the desktop session of the same user that owns the launchd job. The grid below is sized for a single on-call pass over VNC. If you are stuck on a native module or cryptic load error, open the ten common error patterns guide, map your keyword, then return to step five of this runbook to see whether a second cold start reproduces the failure.
| Check | What to do | Pass criteria |
|---|---|---|
| Version triangle | Compare CLI, process argv, and the About or footer in the UI on 18789 | All three show the same major and minor you intended to ship |
| Plugin list | After cold start, open the extension page once and wait for the list to settle | Count matches a pre-change baseline, or the delta is documented in the change record |
| Model authorization | Trigger OAuth refresh paths you rely on; watch for rate limit banners | No unbounded 401 or 429 loops in the time box |
| Disk and I/O | Use Activity Monitor to watch read/write during reindex if still running | Spike decays, free space stays above the safety margin |
| Regression smoke | Send a small probe on your lowest-risk channel | Channel health is a separate line item from plugin health, as in section 01 |
IM onboarding, 18789 safety patterns, and SSH port forwarding next to the same control plane.
Read →Freeze lines, when to break them, and rollback trees when releases ship quickly.
Read →Volume contracts, port 18789, and verifying the UI inside a container on a VNC host.
Read →Run doctor, then repair or reindex, and prove you are not in a mixed-version state. A blind Gateway reinstall only postpones the next cold start, and you lose the diff that would have shown a stale plist or volume.
launchd still points at an older dist entry while your interactive shell uses a newer global install, or a Docker host path and a container path see different trees under bind mounts.
CLI and logs, yes. The control surface and a few macOS permissions still need the graphical session. Use the section 05 table with VNC in the same user account as the Gateway process.
The cold registry trades “scan the whole tree on every boot” for “own your install metadata and your version line.” The operational follow-through is predictable I/O spikes during reindex, tighter coupling between launchd and npm prefixes, and acceptance that must happen in a browser session you can see. If you only read SSH logs, you will merge “channel is fine” and “extensions are fine” on one ticket, then waste days chasing a ghost when only half the stack was ever validated.
A Mac under your desk still needs someone to manage sleep, power, and Apple Silicon thermals. A leased cloud Mac with both SSH and scheduled VNC is often the cleaner way to get a second pair of eyes on 18789 without flying someone to a closet rack. The economics are about who carries late-night drive replacements and colo hands, not about whether OpenClaw is “easy.”
To rent a project-scoped Apple Silicon node that lines up with these checklists, use VNCMac: the cloud Mac purchase page for plans, and the home page for product context. Keep the pairing and Docker articles in the same run folder so the whole team shares one map from pixels to process list.