From 3ms to 2.5s

9 模型 · 11 实验 · 10 家公司

Yuanbo Yang · UCSD · Hao AI Lab · 2026-05

VLA vs VA:谁做推理时 video gen?

VLA 不做 video gen
PIπ0.7
GeneralistGEN-1
FigureHelix
GoogleGemini Robotics
NVIDIAGR00T N1.5
VA / WAM 推理时生成 video
1X1XWM
RhodaFutureVision ($450M)
NVIDIADreamZero
NVIDIA+StanfordCosmos Policy

VLA ~5Hz vs WAM ~0.4Hz

七个系统

系统一句话我们的数据
π0.74 组件 pipeline,300M Action Expert200ms / 5Hz
GEN-1黑箱,500K hr 穿戴数据 pretrain
GENE-26.520-DoF 灵巧手 + sensor glove + sim
Cosmos Policy2B DiT,action = latent frame659ms / 1.5Hz
DreamZero14B video WM → 蒸馏 policy
LingBot-VA加 video gen → 慢 34x2518ms / 0.4Hz
Rhoda"Direct Video Action",零技术披露

Experimental Setup

硬件RTX 5880 Ada 48GB × 1 (xdlab23)
精度bf16, sdpa (OpenVLA: eager attn)
输入224×224 synthetic tensor, batch=1
统计warmup=15, iter=20, 报 median
计时CUDA event (torch.cuda.Event, enable_timing=True)
阶段E (vision encode) / C (LLM prefill) / A (action head)
权重real: Pi-Zero, Cosmos, OpenVLA-OFT
random: 其余 6 个 (验证 Δ<12%, exp07b)

warmup=15 来自 exp07a: 前 12 次 GPU 功率爬坡导致 bimodal 分布 (1.25x)。nvidia-smi -pm 1 后仍需 warmup。

延迟 × 复杂度

Simple Complex 1ms
10ms
100ms
1000ms
10Hz
5Hz
A
S
O
L
P
F
C
V
OFT/MLP Flow/DiT WAM

红线 = 10Hz。只有 OFT 过线。

OFT:Action 165ms → 0.13ms

瓶颈从 action 转到 backbone。

Pi-Zero (flow)
A 82%
200ms · 5Hz
OpenVLA-OFT (7B)
C 84%
109ms · 9Hz
StarVLA-OFT (3B)
E 55%
C 45%
63ms · 16Hz

五次跳跃

范式延迟Hz
1ACT (single forward)3ms300
2VLM + flow head74ms13
3Action DiT200-407ms2.5-5
4Full WAM2518ms0.4
5OFT63-109ms9-16

跳跃 3 最重。跳跃 5 砍迭代换速度。

2x params → 4.4x latency

ParamsPer-StepCross-Attn Tax
OFT MLP~2M0.13ms
NitroGen174M7.2ms
Pi-Zero Expert300M16.5ms+35%
Fast-WAM350M32ms+100%
Cosmos2B76.8msmonolithic

Cross-attn 是隐藏税。

Attention 被 VLA 训练重塑

VLMVLA fine-tune 后
Gini>0.910.07
SinkPos 2 (12-28x)Pos 64
EntropyV-shapeflat

VLM pruning 不可迁移到 VLA。

两条路

Path A 压 Action DiT
FastVideo STA / 蒸馏 / caching
→ Pi-Zero, Fast-WAM, Cosmos
Path A' 砍 action head + 压 backbone
OFT + flash-attn / 量化
→ OpenVLA-OFT, StarVLA-OFT

单请求延迟差 2-10x。先加速,再 serving。

问 Hao

1VLA vs VA,您看好哪条?
2OFT 变成 backbone 问题 — 跟 vLLM / FastVideo 什么关系?
3FastVideo STA / 蒸馏能迁移到 VLA DiT 吗?
4先做 Path A 还是 A'?
5组里谁在做相关的?

Backup

可复现性

ModelRandomRealΔ
NitroGen7.2ms/step7.1ms/step<2%
Pi-Zero200ms225ms+12%
Fast-WAM LIBERO94.5% (paper 93.7%)match

Backup

EPDA 干扰

inflation(X|Y) = 1 + v·a, R²=0.94

{E,A} 安全共卡。{P,D} 必须分开。

带走

  • 9 模型 × 11 实验
  • Action DiT = 80-94% 延迟
  • OFT 翻转瓶颈 → backbone
  • VLA ~5Hz vs WAM ~0.4Hz
  • VLM pruning ≠ VLA pruning
  • Fast VLA first

Y. · Yuanbo Yang · UCSD Hao AI Lab · 2026

References

引用

π0 Black et al. 2024
π0.7 PI, 2026-04
GEN-1 Generalist AI, 2026-04
GENE-26.5 Genesis AI, 2026-05
Helix Figure, 2025
OpenVLA-OFT arXiv:2502.19645
StarVLA arXiv:2604.05014
ACT Zhao et al. RSS 2023
Cosmos Policy arXiv:2601.16163
DreamZero NVIDIA GEAR, 2026
1XWM 1X, 2026-01
Rhoda Rhoda AI, 2026-03
LingBot 开源
Fast-WAM arXiv:2603.16666
NitroGen NVIDIA, 2025
vLLM SOSP 2023 · FastVideo 2025 · DistServe OSDI 2024
1 / 14