7-day rolling volume · 28.9T weekly scale · China vs US · Anthropic premium paradox · Five-step routing
If you are picking an LLM for Agents but drowning in benchmark press releases, who burns tokens each week and where the bill points is the signal closer to production. Bottom line: using OpenRouter Rankings on a 7-day rolling basis, the week of May 18–24, 2026 processed 28.9 trillion tokens globally; DeepSeek-V4-Flash led at 3.43T (+66% week over week). Chinese models have topped US models for four consecutive weeks, while Anthropic shows a classic premium paradox—roughly 12% of token share but about 46% of dollar revenue. This article covers: why bills beat leaderboard screenshots, methodology, that week’s macro totals, a Top 10 snapshot, vendor dual truths, the benchmark–market inversion, five weekly routing steps for Mac developers, and acceptance notes for OpenClaw and Claude Code. Cross-read our June LLM trends deep dive, ds4 local inference on Mac, and renting a Mac for 24/7 Agents when you move from API-only routing to always-on gateways.
MMLU, HumanEval, and SWE-bench answer a narrow question: on a fixed dataset, what is the model’s ceiling? OpenRouter, one of the largest neutral LLM API aggregators—300+ models, 60+ providers, 8M+ users, roughly 100 trillion tokens per month—publishes rankings built from real routed input plus output tokens. Money and compute do not flatter vendors. Developers vote with wallets for models that are fast enough, stable enough, and cheap enough to leave running overnight.
Agent workflows changed the mix. Programming-related traffic on OpenRouter climbed from about 11% in early 2025 to more than 50% by mid-2026—now the single largest use category. That shift reframes the chart: the models topping weekly volume are not always the ones winning a +1 SWE-bench headline. OpenRouter and a16z’s 2025 AI Usage Report (built on 100T tokens of anonymized metadata) noted an uncomfortable pattern—benchmark scores and market share often move in opposite directions. Expensive flagships do not automatically capture Agent batch traffic; extreme value SKUs do.
Benchmarks skew toward ceilings: One-shot runs on static prompts rarely price in dozens of tool rounds and long output chains.
Weekly tokens skew toward body temperature: Five straight weeks of growth, as seen in late May 2026, signals real demand expansion—not a launch-week spike.
Read two axes: Token share shows who carries traffic; dollar revenue share shows who captures margin. The “king” on each axis is often not the same company.
All figures below come from the public board at openrouter.ai/rankings, using OpenRouter’s official weekly (7-day rolling token throughput) view. Core dimensions include total weekly tokens (input + output), per-model ranks, vendor market share, and the split between token share vs dollar revenue share—the last is where pricing differences reveal a second truth beneath the headline rank.
Snapshot week: May 18–24, 2026 (the latest complete week shown when this article was drafted). If you read this later, pull live numbers; the workflow still applies.
Scale check: Roughly one year earlier, OpenRouter processed on the order of 2.4 trillion tokens per week. At 28.9 trillion for the May snapshot, that is about 12× year-over-year growth—AI usage has moved from experimentation to sustained, billable throughput.
| Metric | Value | Week-over-week |
|---|---|---|
| Global weekly volume | 28.9T tokens | +7.4% (5th consecutive weekly rise) |
| China-origin models | 9.223T tokens | +19.89% |
| US-origin models | 4.93T tokens | +16.27% |
| Geopolitical note | Chinese models have led US models on weekly tokens for four straight weeks | |
Pain points when reading the weekly board:
Confusing daily spikes with the weekly roll: The ranking is a 7-day window—do not mix in single-day peaks from your own logs.
Ignoring “everything else”: Beyond China and the US, European open-weight stacks and Stealth models still matter; compare vendor pies on the site, not just this table.
Deciding on stale months: Hy3 Preview and Owl Alpha can post double-digit weekly deltas; routing policies should refresh weekly, not quarterly.
Equating rank #1 with universal default: Top models are usually “ultra-low price × ultra-high throughput.” They are brilliant for Agent loops—not automatic choices for final legal review or multimodal precision work.
| Rank | Model | Vendor | Weekly tokens | WoW | Role |
|---|---|---|---|---|---|
| 1 | DeepSeek-V4-Flash | DeepSeek (China) | 3.43T | +66% | Default Agent brain; aggressive pricing |
| 2 | Tencent Hy3 Preview | Tencent (China) | 3.07T | +16% | Still growing after free tier ended |
| 3 | Claude Sonnet 4.6 | Anthropic (US) | 1.35T | — | Enterprise coding default; 1M context β |
| 4 | DeepSeek-V3.2 | DeepSeek (China) | 1.31T | — | Low-cost long tail; roleplay still active |
| 5 | Owl Alpha | OpenRouter (stealth) | 1.15T | +29% | Free Agent specialist, ~1M context |
| 6 | Gemini 3 Flash Preview | Google (US) | 1.06T | — | Multimodal; academic and clinical mixes |
| 7 | DeepSeek-V4-Pro | DeepSeek (China) | 1.00T | — | Matrix flagship for hard reasoning |
| 8 | MiniMax M2.7 | MiniMax (China) | 806B | — | Long-context value play |
| 9 | Grok 4.1 Fast | xAI (US) | 721B | — | 2M context; strong on legal workloads |
| 10 | Step 3.5 Flash | StepFun (China) | 673B | — | Batch-friendly speed tier |
Source notes: Ranks 1–2 and 5 weekly tokens plus week-over-week changes come from National Business Daily reporting on OpenRouter data for May 18–24, 2026. Ranks 3–4, 6, and 8–10 volumes cross-check the same-week public OpenRouter leaderboard and industry deep-reads. DeepSeek-V4-Pro at 1.00T is derived from the 5.74T series total minus V4-Flash (3.43T) and V3.2 (1.31T). Kimi K2.6, sixth the prior week, dropped out of the top ten and is omitted here.
DeepSeek placed V4-Flash, V4-Pro, and V3.2 inside the top nine simultaneously. Combined series volume hit about 5.74 trillion weekly tokens, up 25.9% week over week—second straight week the vendor beat Anthropic and Google on aggregate throughput. Pull quote: Flash carries volume, Pro carries hard jobs, V3.2 catches long-tail routes. That product matrix is eating the Agent wave; it is not a one-model lottery win.
| Period | China model traffic share (approx.) |
|---|---|
| Early 2025 | < 2% |
| February 2026 | First week China exceeded US on tokens |
| May 2026 | ~45%+; fourth consecutive week ahead of US |
Anthropic’s token share sits near 12%, down from roughly 25% a year earlier, yet dollar revenue share remains near 46%. Enterprises still pay list price for Claude Opus-class reasoning on messy repos and compliance-sensitive workflows—but the token firehose of Agent batch jobs has largely moved to Flash-tier APIs. Traffic leadership has shifted to the value camp; margin pools still sit with premium closed models.
| Tier | Examples | Weekly board signature | Best fit |
|---|---|---|---|
| High value, low volume | Claude Opus | Few tokens, high revenue | Complex reasoning, regulated workflows |
| Mid value, steady volume | Gemini 3 Flash | Stable multimodal growth | Research, clinical, mixed media |
| Ultra-low cost, high volume | DeepSeek / Hy3 / MiniMax / StepFun | Top-of-chart dominance | Agents, coding, batch automation |
While every +1 on SWE-bench earns a blog post, production routers quietly steer bulk traffic toward stacks priced near $0.10 / $0.40 per million input/output tokens. The mechanism is straightforward:
Unit cost beats ceiling scores: In multi-turn Agents, output tokens dominate the invoice—developers optimize for SLA and $/M, not bragging rights on static evals.
Stability beats one brilliant answer: Tool-call failure rate and p95 latency matter more than an occasional wow moment.
Code is the main battlefield: With programming past 50% of OpenRouter traffic, the chart leaders are models that write, edit, and run tests—not chat generalists.
Citable fact: DeepSeek-V4-Flash posted +66% weekly growth in a week without a fresh “new SOTA benchmark” marketing push. The bill moved first; the press release followed later. That is the honest signal weekly rankings provide.
Investors use aggregator throughput to gauge AI commercialization (platform valuations often trade on usage multiples). Developers use it as a vendor-neutral routing reference. Researchers track geopolitical and architectural shifts—MoE, million-token context, Stealth free tiers. Media narratives about “who is winning AI” increasingly cite token volume, not parameter counts on a slide.
Weekly token data has graduated from a niche metric to a commercial weather report: updated every seven days, free to read, yet rarely wired into individual Mac developer workflows. If you run Claude Code or OpenClaw, treating the board like a stock watchlist—check Monday, adjust routes Tuesday—is cheaper than discovering a model shift only when finance forwards the OpenRouter PDF.
Watch bills, not keynotes: Bookmark Rankings; each Monday log the Top 3 models’ week-over-week deltas and compare them to your own OpenRouter usage—divergence is an early warning.
Route by scenario: Agent and batch loops → DeepSeek-V4-Flash; enterprise hard reasoning → Claude Opus; multimodal mixes → Gemini 3 Flash. Keep Sonnet 4.6 as a balanced production fallback.
Track fast climbers: Hy3 Preview and Owl Alpha’s double-digit weekly gains often preview next quarter’s default “spare brain” before your team formally evaluates them.
Set budgets and downgrade paths: In OpenClaw or Claude Code, configure primary, fallback, and escalation models plus per-task token caps so Opus never accidentally eats a batch job.
Accept on macOS with a GUI: After changing routes, re-run Gateway health checks, OAuth, and Keychain prompts. SSH alone cannot click system authorization dialogs. Budget 20 minutes on a VNC remote Mac for acceptance (see our OpenClaw rental guide).
Weekly ops checklist: (1) Rankings URL bookmarked; (2) primary / fallback / escalation model names documented; (3) last week’s total tokens and estimated USD; (4) Agent task failure rate; (5) VNC screenshot of Gateway HTTP 200 self-check. All five mean you turned chart awareness into shipped config—not Slack commentary.
Second pull quote: In 2026, the market votes with 28.9 trillion tokens per week, not press-release adjectives. The developers who win are not always on the highest benchmark; they are on the model that survives fifty tool rounds without blowing the sprint budget.
Top 10 snapshot, six macro trends, and Mac Agent matrices.
Read →openclaw models, cost caps, and fallback wiring.
Read →7×24 Agents, Ollama, and gateway sizing on M4.
Read →Benchmarks measure ceiling ability on fixed datasets. Weekly token volume shows what developers pay to route at scale. Use both—but let bill data confirm who is actually being called in production.
Claude Opus and Sonnet list prices dwarf Flash-tier APIs. Enterprises still pay for hard reasoning, while Agent batch traffic has migrated to low-cost models—the premium paradox in this article’s title.
DeepSeek, Tencent Hy3, and MiniMax pair aggressive API pricing with licenses that fit Agent and coding workloads. For May 18–24, 2026, China routed about 9.223T weekly tokens versus 4.93T for US models.
Visit Rankings weekly; set primary and fallback models with budgets in OpenClaw or Claude Code; complete Gateway and OAuth acceptance over VNC on a remote Mac. See section 08 for the five-step checklist.
The May 18–24 snapshot shows the market voting with money: Chinese open-weight and value-tier APIs are reshaping global routing faster than benchmark seasons update. Who gets called at scale matters more than who scores highest on a static eval—and weekly tokens grew roughly 12× in a year, so treating the board as a routine ops input is rational, not obsessive.
For Mac developers, the hidden tax is often not the API rate—it is a closed laptop killing your gateway, Keychain blocking headless SSH, and OAuth flows that need a real screen. You can pick the right model from the weekly chart and still lose a day if OpenClaw never passes acceptance on your local machine.
Before you capitalize hardware or lock in a single-vendor route, validate primary and fallback pairs on a host that stays awake with GUI access when macOS demands it. VNCMac rents physical Mac mini nodes by the month—use the button below for remote Mac pricing, or compare plans on the homepage first.