My research interests lie at the intersection of vision, machine learning, and robotics. I am working on developing vision-centric reasoning models for multimodal and embodied AI agents, with a focus on object-centric perception systems in dynamic scenes, vision foundation models for open-world scene understanding, and large multimodal models for embodied reasoning and robotics planning