Wufei Ma

I am a PhD student at Johns Hopkins University, advised by Bloomberg Distinguished Professor Dr. Alan Yuille.

I obtained my B.S. with summa cum laude honor from Rensselaer Polytechnic Institute in 2020 and I had a double major in Computer Science and Mathematics. During my undergraduate years, I had worked with Prof. Bülent Yener on discriminative and generative models for microstructure images and with Prof. Lirong Xia on preference learning from natural language.

I’ve spent time at Meta Reality Labs, Microsoft Research Asia, AWS CV Science, and Megvii Research as a research intern.

news

Jul 04, 2024	Co-organizing OOD-CV workshop at ECCV 2024. Call for papers at ood-cv.org.
Jul 01, 2024	Two papers accepted to ECCV 2024.
Jun 10, 2024	I will present our Feint6K dataset at WINVU @ CVPR 2024.
Feb 28, 2024	One paper accepted to IEEE TMM.

selected publications

see all publications here

Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data

Wufei Ma, Kai Li, Zhongshi Jiang, Moustafa Meshry, Qihao Liu, Huiyu Wang, Christian Häne, and Alan Yuille

In European Conference on Computer Vision , 2024

Dataset Vision-Lanugage

Abs Webpage arXiv Data

We propose a novel task, retrieval from counterfacually augmented data, and a dataset, Feint6K, for video-text understanding.
ImageNet3D: Towards General-Purpose Object-Level 3D Understanding

Wufei Ma, Guanning Zeng, Guofeng Zhang, Qihao Liu , Letian Zhang, Adam Kortylewski , Yaoyao Liu, and Alan Yuille

arXiv preprint arXiv:2406.09613, 2024

Dataset 3D Vision

Abs Webpage arXiv Data Code

We present ImageNet3D, a large dataset for general-purpose object-level 3D understanding.
Generating Images with 3D Annotations Using Diffusion Models

Wufei Ma^*, Qihao Liu^* , Jiahao Wang^* , Angtian Wang, Xiaoding Yuan , Yi Zhang, Zihao Xiao, Guofeng Zhang , Beijia Lu, Ruxiao Duan, Yongrui Qi, Adam Kortylewski , Yaoyao Liu, and Alan Yuille

In The Twelfth International Conference on Learning Representations , 2024

(Spotlight, 5%)

Dataset 3D Vision

Abs arXiv Code

We propose 3D-DST that generates synthetic data with 3D groundtruth by incorporating 3D geomeotry control into diffusion models. With our diverse prompt generation, we effectively improve both in-distribution (ID) and out-of-distribution (OOD) performance for various 2D and 3D vision tasks.