OpenClaw 19 мая 2026 г. ~22 мин Google Meet Twilio

OpenClaw v2026.5.4
Ingress Meet, номер Twilio, голосовой мост Gemini

Пути конференций · матрица · runbook 8 шагов · VNC-приёмка

Видеоконференция с ноутбуком и телефоном — Meet и PSTN через шлюз

Команды, подключающие OpenClaw к живым встречам, ломаются иначе, чем на чат-каналах: аудио должно оставаться duplex, PSTN-абонентам нужен стабильный номер, а ассистент должен слышать комнату без API-ключей во вкладке браузера. OpenClaw v2026.5.4 даёт согласованную тройку — ingress Google Meet (календарный вход и захват аудио комнаты), номер Twilio (PSTN-ноги с SecretRef), и голосовой мост Gemini realtime на Gateway, мультиплексирующий источники в ту же семью Live-транспорта, что browser Talk в v2026.4.26. В статье — шесть классов боли, матрица транспорта, runbook из восьми шагов, четыре факта для тикетов и 20-минутная VNC-сетка на арендованных Apple Silicon. См. также публичный доступ Gateway и HTTPS reverse proxy, порядок мультиканального rollout и после baselines — инкрементальное обновление v2026.5.7.

01

Разбор болей: «бот вошёл в Meet» — не то же самое, что «голос работает»

Интеграции встреч ломаются тихо. Логи Gateway могут показывать здоровый процесс, пока участники ничего не слышат, or PSTN callers hear the assistant but Meet attendees do not. The six items below are the recurring classes we see on rented macOS nodes where SSH-only operators never open Chromium site settings or macOS microphone privacy lists.

  1. 01

    Meet OAuth and domain policy: Workspace admins restrict which OAuth clients may read calendar events or join as automated attendees. Symptoms look like “stuck on consent” with no Gateway error until you correlate Google Admin audit timestamps with your redirect URI list.

  2. 02

    Browser capture vs headless fantasy: Meet audio ingest still depends on a supported Chromium profile and honest HTTPS origins. Headless Linux relays cannot close macOS TCC prompts; attempting to fake capture with loopback hacks creates comb filtering and unusable transcripts.

  3. 03

    Twilio credential sprawl: Account SID, API keys, and per-number webhooks scattered across env files cause partial success—PSTN rings, but the voice bridge never receives media events because the callback URL still points at last week’s tunnel hostname.

  4. 04

    Bridge session collisions: Two bridge owners on the same Meet room or conference name produce echo, duplicated tool calls, and transcripts that disagree with channel archives. This is especially common when multichannel fan-out is enabled before voice baselines are frozen.

  5. 05

    Reverse-proxy WebSocket drift: Long-lived duplex audio needs correct Upgrade and idle timeouts on the path between browser, Gateway, and upstream Live endpoints. A TLS terminator tuned for REST will drop bridges that chat smoke tests never exercise.

  6. 06

    Evidence gaps on shared leases: Compliance reviewers ask for “who clicked Allow on the microphone” aligned with Gateway session ids. SSH text alone cannot answer that; you need VNC eyewitness plus exported listener tables in the same macOS user that owns launchd.

Считайте эти боли архитектурными воротами, а не полировкой. If you skip them, the hidden cost is week-long tickets that oscillate between “Google quota” and “model too small” while the bridge never muxed PSTN and Meet on one session id.

02

Матрица решений: какой транспорт ведёт разговор

Используйте таблицу на мосту инцидента, прежде чем менять SKU Gemini. Rows deliberately separate ingress (how audio enters OpenClaw) from reasoning (what the agent does with text).

ПотребностьПредпочтительно в 5.4Не смешивать без muxПервая VNC-проверка
Scheduled Meet with screen shareMeet ingress + single bridge sessionParallel browser Talk tab on same roomChromium mic/site permission for Meet origin
PSTN-only participantTwilio dial-in leg into bridgeSeparate Gateway process per callerTwilio debugger shows in-progress with matching CallSid
Desk developer testing voiceBrowser Talk (4.26 path)Meet bot attendee on same machineOne microphone owner; Activity Monitor audio devices
Async recap after meetingChannel transcript + TTS readoutKeeping bridge open indefinitelyBridge teardown logs; cron job status
Public webhook callbacksHTTPS reverse proxy in front of GatewayRaw port 18789 on the internetTLS cert hostname = Twilio webhook URL host
IM fan-out during live callMultichannel after bridge baselineEnabling all channels before Meet smokechannels list vs active bridge owner

The matrix pairs naturally with multichannel guidance: text channels are excellent for command-and-control, but they should not become a second audio owner while a bridge session is live. When you expose Gateway publicly for Twilio webhooks, reuse the same Host header and certificate discipline documented for operator consoles—do not invent a one-off HTTP endpoint on a different subdomain without updating Twilio voice URLs.

One bridge session id per live room—Meet legs, PSTN legs, and Gemini upstream must share it or you are debugging echo, not intelligence.

03

Архитектурный эскиз: как связаны части 5.4

Думайте тремя плоскостями. Плоскость ingress: Meet connector subscribes to calendar events (or explicit meet URLs), launches a controlled browser context, and forwards room audio frames into Gateway. Twilio connector accepts inbound PSTN or SIP, normalizes codecs, and attaches as another leg on the same bridge. Плоскость моста: Gateway owns session lifecycle, trace ids, SecretRef resolution for Google and Twilio credentials, and back-pressure when upstream Live endpoints throttle. Плоскость агента: tools, skills, and channel transcripts remain orthogonal—you still want structured commands in Slack or Telegram while voice stays duplex.

Compared with v2026.4.26 browser Talk, Meet ingress adds scheduling and attendee policy: the bot is a participant with organizational consent, not a local tab experiment. Compared with multichannel messaging, voice bridge sessions are time-bounded and sensitive to jitter; do not reuse IM retry policies for audio frames. Gemini realtime voice bridge here means the same Live family transport used for Talk, but fed by muxed PCM or Opus legs rather than a single tab capture—Gateway negotiates upstream tokens so secrets never land in Local Storage.

On a leased remote Mac, the practical anchor is still one interactive macOS user that owns launchd, Chromium profiles, and microphone TCC entries. Splitting “Gateway on user A, browser on user B” recreates the classic split-brain cache where Meet shows connected while the bridge reads silence.

04

Runbook из восьми шагов: от заморозки до боевого моста

Выполняйте по порядку. Early steps pin versions and URLs; middle steps validate ingress; final steps attach observability before you enable multichannel fan-out.

  1. 01

    Freeze and backup: Record openclaw --version, node absolute path, OPENCLAW_HOME, Gateway listener matrix, lease id, and launchd label. Export current Meet and Twilio config stanzas (redact secrets) into the change ticket.

  2. 02

    Upgrade to v2026.5.4 and doctor: Run openclaw doctor; resolve deprecated relay keys from 4.26-era snippets before touching Meet. Keep a rollback tarball of the prior config tree.

  3. 03

    Workspace OAuth (VNC mandatory): Complete Google Workspace consent in Chromium as the Gateway user; capture Admin console client id allowlisting if your domain restricts apps.

  4. 04

    Twilio SecretRef and webhooks: Store Account SID and auth tokens via SecretRef; point voice status callbacks at your HTTPS reverse-proxy hostname, not an ephemeral tunnel. Validate TLS chain from outside your VPC.

  5. 05

    Declare one bridge profile: Configure Meet ingress and Twilio dial-in to share a bridgeSessionId template per calendar series or conference name. Document teardown idle timeout (for example 120 seconds after last PSTN hangup).

  6. 06

    Lab Meet smoke: Join a test Meet with two human headsets plus one dial-in number. Confirm Gateway logs show a single bridge owner and matching trace ids on Meet and Twilio legs.

  7. 07

    Gemini upstream probe: Run a short duplex prompt through the bridge; capture first-byte latency and end-to-end round-trip in Gateway metrics. Compare against browser Talk baselines from 4.26 on the same host.

  8. 08

    Enable multichannel fan-out: Only after voice baselines pass, follow multichannel rollout order so Telegram or Slack commands cannot spawn a second bridge on the same room.

yaml
voiceBridge:
  owner: gateway
  geminiLive:
    region: us-central1
    traceHeader: X-OpenClaw-Bridge-Trace
  meet:
    calendarId: primary
    joinWindowMinutes: 15
  twilio:
    dialInNumber: "+1XXXXXXXXXX"
    statusCallback: "https://gateway.example.com/twilio/voice/status"
  mux:
    bridgeSessionTemplate: "meet-${eventId}"
    maxPstnLegs: 4
    idleTeardownSeconds: 120

Примечание: Keys are illustrative; your build may expose equivalent settings via openclaw configure sections. Treat YAML as documentation for reviewers, not as copy-paste without checking release notes.

bash
openclaw --version
openclaw doctor
openclaw gateway status
openclaw secrets audit
lsof -nP -iTCP -sTCP:LISTEN | rg -i "openclaw|18789" || true
openclaw channels list
05

Факты для тикетов

  • Факт 1: A successful Meet join banner without a matching bridge.session.open log line is a false green—treat UI state and Gateway session ids as paired evidence.
  • Факт 2: Twilio CallSid must appear in the same trace bucket as the Meet eventId within two seconds of mux attach; otherwise PSTN audio is on an orphan leg.
  • Факт 3: Keep at least 25 percent free disk on leased SSDs before enabling simultaneous Meet capture and transcript archives—short writes during bridge teardown have caused “amnesia” symptoms mistaken for model drift.
  • Факт 4: Reverse-proxy idle timeouts above 120 seconds are a common root cause of mid-meeting drops when only REST health checks stay green; align proxy, Gateway, and Twilio HTTP callbacks on one timeout table.

Предупреждение: Do not file “Gemini quota” as root cause until bridge mux and proxy WebSocket upgrades are ruled out—quota dashboards are polite liars on duplex paths.

06

Двадцатиминутная VNC-сетка приёмки

Запускайте SSH-автоматизацию и VNC-свидетель в одном проходе. The grid below is sized for a single operator on a leased Mac; attach screenshots to the change record.

ПроверкаVNC (same user as Gateway)SSHOK
Version footerGateway UI build matches CLIopenclaw --version5.4.x consistent
Meet mic consentChromium + System Settings microphoneNot substitutablePaths match binaries
Twilio webhook reachabilityOptional browser to status URLcurl -I via public hostnameTLS valid; 2xx
Bridge trace alignmentNetwork filter on trace headerGateway log grepSingle session id
Duplex smokeHear round-trip within SLAMetrics snapshotNo one-way audio
TeardownMeet tab closed cleanlyIdle timer firedNo orphan PSTN

If you plan a subsequent jump to v2026.5.7, archive this grid’s JSON and log excerpts as the voice baseline bundle. Publish-chain fixes in 5.7 do not replace bridge acceptance—they sit on top.

For organizations that also run outbound-only agents on Linux, keep Meet and Twilio ingress on the macOS anchor host. Linux remains excellent for webhooks and batch jobs, but it cannot close the microphone and OAuth evidence chain this workflow requires.

07

Операционные учения вне happy path

Two drills catch production issues that happy-path smokes miss. Drill A—proxy failover: reload Nginx or Caddy during an active bridge and confirm Twilio retries status callbacks without spawning a second bridge session. Drill B—partial PSTN loss: drop one caller leg while Meet stays up; verify mux policy either removes the leg gracefully or marks the session degraded in logs operators actually read.

Document expected agent behavior when Meet screen-share starts: some teams mute bridge capture to avoid narrating slide text; others want vision tools on shared content. The 5.4 bridge does not remove product policy—you still declare whether screen content becomes model input or stays out of band.

Finally, align retention: voice transcripts may be more sensitive than IM archives. Pair bridge configuration with your existing SecretRef audit cadence and legal hold rules before you invite external dial-in numbers.

Читать далее

Связанные руководства на сайте

FAQ

Частые вопросы

Yes—that is the 5.4 design point. Declare one bridge owner per live room and mux PSTN legs before the Gemini upstream. Two owners on the same Meet create echo and diverging transcripts.

No. Workspace OAuth, Chromium permissions, and macOS microphone TCC require the same interactive user you get over VNC. SSH remains essential for listener tables and log archives.

4.26 optimizes a local browser tab on Google Live transport. 5.4 adds calendar Meet ingress and Twilio PSTN with explicit bridge session semantics in Gateway.

Validate 5.4 bridge acceptance first if Meet or Twilio is in scope. Apply 5.7 incrementally for publish-chain and channels CLI improvements without skipping voice baselines.

Итог

OpenClaw v2026.5.4 turns meeting audio into a first-class Gateway concern: Meet and Twilio are ingress planes, Gemini Live is the duplex reasoning transport, and your change process still owns secrets, proxy timeouts, and session teardown. Teams that try to run this only over SSH routinely lose weeks to permission drift and false-green Meet UI states that logs never explain.

Owning a physical Mac adds sleep policy, update windows, and hardware depreciation; undersized laptops choke when Meet capture, PSTN mux, and transcript archives coincide. A leased remote Mac with a reviewable GUI session keeps imaging and uptime with the provider while you keep bridge policy and SecretRef inventory—usually with a shorter mean time to recover when a bridge drops mid-call.

Меньше CapEx, но по-прежнему приёмка из раздела 6 под одним пользователем macOS: VNCMac сдаёт облачные Mac — основная кнопка ведёт на страницу покупки Mac в облаке; сравните тарифы на главной перед следующим окном изменения моста.