505B MoE · 512K context · seven components · deploy guide · vs DeepSeek
On June 30, 2026, Huawei shipped on its HDC 2026 promise: openPangu-2.0-Flash weights, inference code, and training operators went live on GitCode. Bottom line: this is the first frontier-scale open LLM trained entirely on non-NVIDIA hardware, and one of the few planning seven full-stack open components including pre-training code. This guide covers the timeline, mHC/ModAttn architecture, competitor matrix, ModelArts API and GitCode self-host steps, sovereign-AI implications, and Mac-side multi-model routing checks. See also our June OpenRouter rankings for the wider China-model landscape.
Disclaimer: some capability ratings are architecture-based estimates; we will update when independent benchmarks land. Published July 1, 2026.
| Date | Event |
|---|---|
| 2026-06-12 | HDC 2026 — Richard Yu keynotes openPangu 2.0 launch |
| 2026-06-30 | Flash weights, inference code, training operators on GitCode |
| July 2026 (planned) | Pro weights and inference code |
| H2 2026 (planned) | Pre-training code, post-training code, more operators |
Export controls: US restrictions on A100/H100 made “no NVIDIA, no frontier model” a common assumption — 505B MoE on Ascend challenges it.
Open depth: most labs ship weights + inference only; Huawei plans pre/post-training code and Ascend kernels.
News window: Flash went live June 30 — peak interest for developers evaluating sovereign stacks.
HarmonyOS Agent: native engine for HarmonyOS 7 agents; 30B edge model runs offline on Kirin phones.
| Variant | Total | Active | Sparsity | Context | Status |
|---|---|---|---|---|---|
| Pro | 505B | 18B | ~28:1 | 512K | July 2026 |
| Flash | 92B | 6B | ~15:1 | 512K | Live June 30 |
Cite: 512K tokens ≈ eight full novels in one prompt; Flash activates only 6B params per token while drawing on 92B knowledge.
Model architecture — released
Weights (Flash live; Pro July) — Flash released
Technical report — released
Inference + training operators — released
Pre-training code — H2 2026
Post-training (SFT/RLHF) — H2 2026
Ascend training kernels — H2 2026
| Metric | Value |
|---|---|
| Hypernode training efficiency | +30% |
| 512K sequence throughput | +50% |
| Train/inference consistency (MoE) | >99% |
| Ascend single-card vs mainstream OSS | 2× throughput |
| Flash-Int8 (W4A8) | -40% memory, <10% quality loss |
Training ran on Ascend 910B NPUs only — no A100/H100. Stack: CANN (CUDA-class runtime) + torch_npu; standard PyTorch with import torch_npu switches backend. Deploy via ModelArts API, GitCode self-host, or HarmonyOS native integration. Edge: 30B embedded model — 50% faster inference, 20% less memory on Kirin silicon.
| Model | Total | Active | Context | Hardware | Open depth |
|---|---|---|---|---|---|
| openPangu 2.0 Pro | 505B | 18B | 512K | Ascend | 7 components |
| DeepSeek V4 Pro | 1.6T | ~200B | 128K | NVIDIA | weights + infer |
| Qwen 3.7 Max | ~400B+ | varies | 128K | NVIDIA | partial training |
| Kimi K2.7 | 1T | 32B | 256K | NVIDIA | weights + infer |
DeepSeek wins coding and hard reasoning today. openPangu wins 512K context (4× most rivals), sovereign deployment without NVIDIA, 2× Ascend throughput, and planned full training pipeline. Kimi wins MCP-heavy agent tooling. Pick Flash for local cost (~96GB); Pro for long-document RAG when weights ship in July.
Sign up for Huawei Cloud
ModelArts → AI Gallery → search openPangu 2.0
Subscribe and copy API endpoint + token
Call Chat Completions (curl below)
Set per-model billing caps and audit logs
curl -X POST "https://modelarts.${REGION}.myhuaweicloud.com/v1/infers/openpangu-2-flash/chat/completions" \
-H "Content-Type: application/json" \
-H "X-Auth-Token: ${TOKEN}" \
-d '{"model":"openpangu-2.0-flash","messages":[{"role":"user","content":"Explain MoE in simple terms"}],"max_tokens":1024}'
python inference.py --model_path ./openPangu-Flash --device npu:0 --context_length 512000 --precision bf16
| Variant | Recommended | Minimum |
|---|---|---|
| Flash | 1× Ascend 910B | ~96GB unified memory |
| Flash-Int8 | Atlas A2 | ~48GB VRAM |
| Pro | 4+ Ascend 910B | multi-card cluster |
Under openPangu License: commercial use permitted, royalty-free, non-exclusive (see GitCode for terms). Strategically, openPangu backs HarmonyOS 7 agents (>90% complex-task success on framework 2.0). When pre-training code ships H2 2026, researchers can reproduce a frontier MoE pipeline on Ascend — rare at this scale.
Links: GitCode Ascend Tribe · ModelArts · HDC 2026
Yes — Ascend 910B only, no A100/H100 in the training pipeline.
DeepSeek for coding/reasoning; openPangu for 512K docs, sovereign/Ascend deploy, and future full training code.
openPangu 2.0 is not today’s benchmark king — DeepSeek still leads many coding tasks. It is something else: a NVIDIA-independent, full-stack frontier MoE with 512K context and a credible open roadmap. Flash weights are live now.
Routing openPangu beside Claude or DeepSeek in OpenClaw on macOS often needs GUI OAuth, Keychain, and a host that stays awake. Before you buy hardware, validate primary/fallback pairs on a Mac with real screens. VNCMac rents physical Mac mini nodes monthly for multi-model Agent routing — pricing, homepage.