Open Source LLM July 1, 2026 ~22 min read openPangu Ascend NPU

Huawei's openPangu 2.0 Is Open-Source
Trained Without a Single NVIDIA GPU

505B MoE · 512K context · seven components · deploy guide · vs DeepSeek

Huawei openPangu 2.0 open-source MoE large language model on Ascend NPU

On June 30, 2026, Huawei shipped on its HDC 2026 promise: openPangu-2.0-Flash weights, inference code, and training operators went live on GitCode. Bottom line: this is the first frontier-scale open LLM trained entirely on non-NVIDIA hardware, and one of the few planning seven full-stack open components including pre-training code. This guide covers the timeline, mHC/ModAttn architecture, competitor matrix, ModelArts API and GitCode self-host steps, sovereign-AI implications, and Mac-side multi-model routing checks. See also our June OpenRouter rankings for the wider China-model landscape.

Disclaimer: some capability ratings are architecture-based estimates; we will update when independent benchmarks land. Published July 1, 2026.

01

Timeline: HDC 2026 to GitCode Release

DateEvent
2026-06-12HDC 2026 — Richard Yu keynotes openPangu 2.0 launch
2026-06-30Flash weights, inference code, training operators on GitCode
July 2026 (planned)Pro weights and inference code
H2 2026 (planned)Pre-training code, post-training code, more operators

Why this release matters

  1. 01

    Export controls: US restrictions on A100/H100 made “no NVIDIA, no frontier model” a common assumption — 505B MoE on Ascend challenges it.

  2. 02

    Open depth: most labs ship weights + inference only; Huawei plans pre/post-training code and Ascend kernels.

  3. 03

    News window: Flash went live June 30 — peak interest for developers evaluating sovereign stacks.

  4. 04

    HarmonyOS Agent: native engine for HarmonyOS 7 agents; 30B edge model runs offline on Kirin phones.

02

Specs and Seven Open Components

VariantTotalActiveSparsityContextStatus
Pro505B18B~28:1512KJuly 2026
Flash92B6B~15:1512KLive June 30

Cite: 512K tokens ≈ eight full novels in one prompt; Flash activates only 6B params per token while drawing on 92B knowledge.

  1. 01

    Model architecture — released

  2. 02

    Weights (Flash live; Pro July) — Flash released

  3. 03

    Technical report — released

  4. 04

    Inference + training operators — released

  5. 05

    Pre-training code — H2 2026

  6. 06

    Post-training (SFT/RLHF) — H2 2026

  7. 07

    Ascend training kernels — H2 2026

03

Architecture and Training Breakthroughs

  • mHC routing: Multi-Head Combinatorial expert routing, less load imbalance
  • Muon optimizer: second-order momentum for large-scale stability
  • ModAttn: modular attention for 512K windows
  • DSA+SWA (Flash): ultra-sparse attention for inference efficiency
MetricValue
Hypernode training efficiency+30%
512K sequence throughput+50%
Train/inference consistency (MoE)>99%
Ascend single-card vs mainstream OSS2× throughput
Flash-Int8 (W4A8)-40% memory, <10% quality loss
04

Ascend Stack and Developer Ecosystem

Training ran on Ascend 910B NPUs only — no A100/H100. Stack: CANN (CUDA-class runtime) + torch_npu; standard PyTorch with import torch_npu switches backend. Deploy via ModelArts API, GitCode self-host, or HarmonyOS native integration. Edge: 30B embedded model — 50% faster inference, 20% less memory on Kirin silicon.

05

vs DeepSeek, Qwen, Kimi — Honest Tradeoffs

ModelTotalActiveContextHardwareOpen depth
openPangu 2.0 Pro505B18B512KAscend7 components
DeepSeek V4 Pro1.6T~200B128KNVIDIAweights + infer
Qwen 3.7 Max~400B+varies128KNVIDIApartial training
Kimi K2.71T32B256KNVIDIAweights + infer

DeepSeek wins coding and hard reasoning today. openPangu wins 512K context (4× most rivals), sovereign deployment without NVIDIA, 2× Ascend throughput, and planned full training pipeline. Kimi wins MCP-heavy agent tooling. Pick Flash for local cost (~96GB); Pro for long-document RAG when weights ship in July.

06

How to Access: ModelArts API and GitCode

  1. 01

    Sign up for Huawei Cloud

  2. 02

    ModelArts → AI Gallery → search openPangu 2.0

  3. 03

    Subscribe and copy API endpoint + token

  4. 04

    Call Chat Completions (curl below)

  5. 05

    Set per-model billing caps and audit logs

ModelArts API
curl -X POST "https://modelarts.${REGION}.myhuaweicloud.com/v1/infers/openpangu-2-flash/chat/completions" \
  -H "Content-Type: application/json" \
  -H "X-Auth-Token: ${TOKEN}" \
  -d '{"model":"openpangu-2.0-flash","messages":[{"role":"user","content":"Explain MoE in simple terms"}],"max_tokens":1024}'
Flash on single Ascend 910B
python inference.py --model_path ./openPangu-Flash --device npu:0 --context_length 512000 --precision bf16
VariantRecommendedMinimum
Flash1× Ascend 910B~96GB unified memory
Flash-Int8Atlas A2~48GB VRAM
Pro4+ Ascend 910Bmulti-card cluster
07

Sovereign AI, License, HarmonyOS Agents

Under openPangu License: commercial use permitted, royalty-free, non-exclusive (see GitCode for terms). Strategically, openPangu backs HarmonyOS 7 agents (>90% complex-task success on framework 2.0). When pre-training code ships H2 2026, researchers can reproduce a frontier MoE pipeline on Ascend — rare at this scale.

Links: GitCode Ascend Tribe · ModelArts · HDC 2026

FAQ

FAQ

Yes — Ascend 910B only, no A100/H100 in the training pipeline.

DeepSeek for coding/reasoning; openPangu for 512K docs, sovereign/Ascend deploy, and future full training code.

Closing

openPangu 2.0 is not today’s benchmark king — DeepSeek still leads many coding tasks. It is something else: a NVIDIA-independent, full-stack frontier MoE with 512K context and a credible open roadmap. Flash weights are live now.

Routing openPangu beside Claude or DeepSeek in OpenClaw on macOS often needs GUI OAuth, Keychain, and a host that stays awake. Before you buy hardware, validate primary/fallback pairs on a Mac with real screens. VNCMac rents physical Mac mini nodes monthly for multi-model Agent routing — pricing, homepage.