SIGGRAPH 2026
ReRoPE repurposes Rotary Position Embedding for shift-invariant relative camera control in video generation.
Understanding and Improving Vision Foundation Models
I am an incoming PhD student at UC San Diego, working with Prof. Hao Zhang. Previously, I was a research intern at University of Michigan with Prof. Jun Gao. I got my Master's degree at Zhejiang University, working with Prof. Yiyi Liao. I am also lucky to have collaborated with Prof. Andreas Geiger. I obtained my B.Eng. degree from Hangzhou Dianzi University in 2022.
My research focuses on understanding and improving vision foundation models. Previously, I worked on 3D generative models. Feel free to reach out!
SIGGRAPH 2026
ReRoPE repurposes Rotary Position Embedding for shift-invariant relative camera control in video generation.
CVPR 2026
Gen3R bridges foundational reconstruction models and video diffusion models for scene-level 3D generation.
CVPR 2025
Prometheus introduces feed-forward scene-level 3D generation in seconds, harnessing pre-trained 2D priors for generalizable and efficient 3D synthesis.
T-PAMI 2025
UrbanGen generates 3D urban radiance fields with photorealistic rendering, accurate geometry, high controllability, and diverse city styles.
CVPR 2025
ChronoDepth addresses temporally consistent video depth estimation using video diffusion model priors.
NeurIPS 2025
Orientation Matters addresses orientation alignment in 3D generative models.
arXiv 2025
HeFT is a zero-shot point tracking framework leveraging visual priors of pretrained video diffusion models.
arXiv 2025
The Constant Eye benchmarks appearance robustness under OOD conditions in autonomous driving.
ECCV 2024 (Oral)
TeFF learns 3D-aware GANs from unposed images via on-the-fly pose estimation with a learned template feature field.
SIGGRAPH 2024
MaPa creates segment-wise procedural material graphs for high-quality rendering with significant editing flexibility.
ICCV 2023
UrbanGIRAFFE leverages coarse 3D panoptic priors to guide a 3D-aware generative model for photorealistic urban scene synthesis with diverse controllability.