I obtained my B.S. with summa cum laude honor from Rensselaer Polytechnic Institute in 2020 and I had a double major in Computer Science and Mathematics. During my undergraduate years, I had worked with Prof. Bülent Yener on discriminative and generative models for microstructure images and with Prof. Lirong Xia on preference learning from natural language.
I’ve spent great time at GoogleResearch, Reality Labs, Research Asia, Frontier AI & Robotics (FAR) and AWS CV Science, Research, and collaborated with many exceptional researchers.
While self-supervised pretraining has reduced vision systems’ reliance on synthetic data, simulation remains an indispensable tool for closed-loop optimization and rigorous out-of-distribution (OOD) evaluation. However, modern simulation platforms often present steep technical barriers, requiring extensive expertise in computer graphics and game development. In this work, we present LychSim, a highly controllable and interactive simulation framework built upon Unreal Engine 5 to bridge this gap. LychSim is built around three key designs: (1) a streamlined Python API that abstracts away underlying engine complexities; (2) a procedural data pipeline capable of generating diverse, high-fidelity environments with varying out-of-distribution (OOD) visual challenges, paired with rich 2D and 3D ground truths; and (3) a native integration of the Model Context Protocol (MCP) that transforms the simulator into a dynamic, closed-loop playground for reasoning agentic LLMs.
SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning
We introduce SpatialReasoner, a novel large vision-language model (LVLM) that address 3D spatial reasoning with explicit 3D representations shared between stages – 3D perception, computation, and reasoning. Explicit 3D representations provide a coherent interface that supports advanced 3D spatial reasoning and enable us to study the factual errors made by LVLMs.
3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark
We systematically study the impact of 3D-informed data, architecture, and training setups and present SpatialLLM, an LMM with advanced 3D spatial reasoning abilities.
ImageNet3D: Towards General-Purpose Object-Level 3D Understanding
We propose 3D-DST that generates synthetic data with 3D groundtruth by incorporating 3D geomeotry control into diffusion models. With our diverse prompt generation, we effectively improve both in-distribution (ID) and out-of-distribution (OOD) performance for various 2D and 3D vision tasks.