Photo

Yunze Man

I am a Ph.D. student in Computer Science at the University of Illinois Urbana-Champaign, advised by Yuxiong Wang and Liangyan Gui. My research is generously supported by the NVIDIA PhD Fellowship. I received M.S. in Robotics at Carnegie Mellon University, advised by Kris Kitani. I received my B.S. in Computer Science from Zhejiang University.

My research interests lie at the intersection of vision, machine learning, and robotics. I am working on developing vision-centric reasoning models for multimodal and embodied AI agents, with a focus on object-centric perception systems in dynamic scenes, vision foundation models for open-world scene understanding and generation, and large multimodal models for embodied reasoning and robotics planning

Email
[Google Scholar] [Github] [Twitter]

News

  • [12/2024]    Received the NVIDIA PhD Fellowship 2025.
  • [11/2024]    Selected as one of the Top Reviewers in NeurIPS 2024.
  • [09/2024]    Lexicon3D accepted to NeurIPS 2024!
  • [09/2024]    SceneCraft accepted to NeurIPS 2024!
  • [05/2024]    Selected as one of the Outstanding Reviewers in CVPR 2024.
  • [05/2024]    Started my internship at NVIDIA Research. Look forward to seeing you in Bay Area!
  • [01/2024]    LLM4Vision accepted to ICLR 2024 (Spotlight)!
  • [02/2023]    I passed the qualifying exam and officially became a Ph.D. candidate!

Selected Publications

Please refer to my Google Scholar profile for the full list of publications.

(* indicates equal contribution)
RandAR
RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
Preprint, 2024 / Paper / Project
Lexicon3D
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Reasoning
Lexicon3D
SceneCraft: Layout-Guided 3D Scene Generation
LM4Vision
LLM4Vision: Frozen Transformers from Language Models are Effective Visual Encoder Layers
Spotlight presentation
ORG
Floating No More: Object-Ground Reconstruction from a Single Image
Preprint, 2024 / Paper / Project
situation3d
SituationVLM: Situational Awareness Matters in 3D Vision Language Reasoning
DualCross
DualCross: Cross-Modality Cross-Domain Adaptation for Monocular BEV Perception
BEVGuide
BEV-Guided Multi-Modality Fusion for Driving Perception

Internship Experience

Professional Service

  • Reviewer for CVPR, ECCV, ICCV, ICLR, NeurIPS, ICML, AAAI, IROS, ICRA
    2021 - 2024
  • Teaching Assisant
    • Learining to Learn (CS598), UIUC
      Fall 2022

    • Efficient & Predictive Vision (CS598), UIUC
      Spring 2022

    • Machine Learning (CS446), UIUC
      Fall 2021

    • Computer Vision Capstone (16-621), CMU
      Spring 2020, 2021

Contact

University of Illinois Urbana-Champaign & Computer Science Department
201 N Goodwin Ave
Urbana, IL 61801

© University of Illinois Urbana-Champaign. Last Updated: 2024