Photo

Yunze Man 「满运泽」

Research Scientist at NVIDIA GEAR. Recent PhD from UIUC. Solid experience in AI agents, embodied AI, VLM post-training. PhD research supported by NVIDIA Graduate Fellowship.

Research interests lie at the intersection of vision, robotics, and machine learning. I develop vision-centric large multimodal models, and embodied agentic AI. I am interested in AI agents that interact with digital and physical worlds.

Email
[Google Scholar] [Github] [X]

News

  • [06/2026]    Defended my PhD thesis “Visually Grounded Multimodal Models for Spatial Intelligence”!
  • [05/2026]    Joined NVIDIA GEAR as a Research Scientist.
  • [03/2026]    Gave a Lightning Talk on LocateAnything3D at NVIDIA GTC 2026.
  • [02/2026]    LocateAnything3D and Fast-ThinkAct accepted to CVPR 2026; Capturing accepted to ICLR 2026.
  • [08/2025]    Started internship at NVIDIA GEAR. Excited to work on Generalist Robotics!
  • [12/2024]    Received the NVIDIA Graduate Fellowship 2025.
Show more ▾

Selected Publications

Please refer to my Google Scholar profile for the full list of publications.

(* indicates equal contribution)
fast-thinkact
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning
locateanything3d
LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight
capturing
Capturing Visual Environment Structure Correlates with Control Performance
gr00t
GR00T N1.6: An Improved Open Foundation Model for Generalist Humanoid Robots
GR00T Team
argus
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
RandAR
RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
Oral presentation
org
Floating No More: Object-Ground Reconstruction from a Single Image
Lexicon3D
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Reasoning
Lexicon3D
SceneCraft: Layout-Guided 3D Scene Generation
LM4Vision
LLM4Vision: Frozen Transformers from Language Models are Effective Visual Encoder Layers
Spotlight presentation
situation3d
SituationVLM: Situational Awareness Matters in 3D Vision Language Reasoning

Industry Experience

  • [05/2026 ~ Present], NVIDIA Generalist Embodied Agent Research (GEAR), Research Scientist
  • [12/2024 ~ 03/2026], OpenAGI Foundation, Founding Researcher
  • [08/2025 ~ 03/2026], NVIDIA Generalist Embodied Agent Research (GEAR), Research Scientist Intern
  • [05/2024 ~ 12/2024], NVIDIA Learning and Perception Research, Research Scientist Intern
  • [05/2022 ~ 08/2022], Adobe Research, Research Scientist Intern

Professional Service

  • Reviewer for CVPR, ECCV, ICCV, ICLR, NeurIPS, ICML, AAAI, IROS, ICRA, TMLR, WACV
                2021 - 2026
  • Teaching Assisant
    • Learining to Learn (CS598), UIUC
      Fall 2022

    • Efficient & Predictive Vision (CS598), UIUC
      Spring 2022

    • Machine Learning (CS446), UIUC
      Fall 2021

    • Computer Vision Capstone (16-621), CMU
      Spring 2020, 2021


© Yunze Man. Last Updated: 06/2026