Wufei Ma

I am a PhD student at Johns Hopkins University, advised by Bloomberg Distinguished Professor Dr. Alan Yuille.

I obtained my B.S. with summa cum laude honor from Rensselaer Polytechnic Institute in 2020 and I had a double major in Computer Science and Mathematics. During my undergraduate years, I had worked with Prof. Bülent Yener on discriminative and generative models for microstructure images and with Prof. Lirong Xia on preference learning from natural language.

I've spent time at Meta Reality Labs, Microsoft Research Asia, AWS CV Science, and Megvii Research as a research intern.

Hiring: We are recruiting research interns at CCVL. If you are interested in the opportunities, please email me for details.

Email  /  CV  /  Google Scholar  /  Instagram

profile photo

  • Feb  2024   One paper accepted to TMM.
  • Jan   2024   One paper accepted to ICLR 2024 as spotlight (5%).
  • Sep  2023   One paper accepted to NeurIPS 2023.
  • Sep  2023   Codebase for neural mesh models released here.
  • Jul   2023   One paper accepted to ICCV 2023.
  • Jun  2023   ICCV 2023 OOD-CV challenge released.
  • Apr 2023   Program committee of CVPR 2023 Workshop - Generative Models for Computer Vision.
  • Mar 2023   I will co-organize the OOD Generalization in Computer Vision Workshop at ICCV 2023.
  • Mar 2023   One paper accepted to CVPR 2023 as highlight.
  • Publications
    ImageNet3D: Towards General-Purpose Object-Level 3D Understanding
    Wufei Ma, Guanning Zeng, Qihao Liu, Letian Zhang, Adam Kortylewski, Yaoyao Liu, Alan Yuille
    Preprint, 2024
    Project Page / Data / Code

    We present ImageNet3D, a large dataset for general-purpose object-level 3D understanding.

    Uncertainty-Aware Deep Video Compression with Ensembles
    Wufei Ma, Jiahao Li, Bin Li, Yan Lu
    IEEE Transactions on Multimedia, 2024
    arXiv / IEEE Xplore

    We propose an uncertainty-aware video compression model that effectively captures the predictive uncertainty with deep ensembles and saves bits by more than 20% when compared to DVC Pro.

    Generating Images with 3D Annotations Using Diffusion Models
    Wufei Ma*, Qihao Liu*, Jiahao Wang*, ..., Adam Kortylewski, Yaoyao Liu, Alan Yuille
    (* denotes equal contribution)
    ICLR, 2024 (Spotlight, 5%)
    Project Page / arXiv / Data (3D-DST images) / Data (aligned 3D models) / Data (LLM-generated captions) / Code

    We propose 3D-DST that generates synthetic data with 3D groundtruth by incorporating 3D geomeotry control into diffusion models. With our diverse prompt generation, we effectively improve both in-distribution (ID) and out-of-distribution (OOD) performance for various 2D and 3D vision tasks.

    3D-Aware Visual Question Answering about Parts, Poses and Occlusions
    Xingrui Wang, Wufei Ma, Zhuowan Li, Adam Kortylewski, Alan Yuille
    NeurIPS, 2023
    arXiv / Code / Dataset

    We introduce Super-CLEVR-3D, a compositional reasoning dataset that contains questions about object parts, their 3D poses, and occlusions. We propose PO3D-VQA, a 3D-aware VQA model that combines probabilistic neural symbolic program execution with 3D generative representations of objects.

    Neural Textured Deformable Meshes for Robust Analysis-by-Synthesis
    Angtian Wang*, Wufei Ma*, Alan Yuille, Adam Kortylewski
    (* denotes equal contribution)
    WACV, 2023

    We introduce Neural Textured Deformable Meshes (NTDM), which learns a neural mesh model with deformable geometry and enables optimization on both camera parameters and object geometries.

    Robust Category-Level 3D Pose Estimation from Synthetic Data
    Jiahao Yang, Wufei Ma, Angtian Wang, Xiaoding Yuan, Adam Kortylewski, Alan Yuille
    WACV, 2023

    We introduce SyntheticP3D, a synthetic dataset for object pose estimation, and CC3D that adapts neural mesh models from synthetic to real data.

    Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape
    Jiacong Xu, Yi Zhang, Jiawei Peng, Wufei Ma, ..., Alan Yuille, Adam Kortylewski
    ICCV, 2023
    Project Page / arXiv / Dataset

    Animal3D consists of 3379 images collected from 40 mammal species, high-quality annotations of 26 keypoints, and importantly the pose and shape parameters of the SMAL model.

    OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images
    Bingchen Zhao, Jiahao Wang, Wufei Ma, Artur Jesslen, Siwei Yang, Shaozuo Yu, Oliver Zendel, Christian Theobalt, Alan Yuille, Adam Kortylewski
    arXiv, 2023
    Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning
    Zhuowan Li, Xingrui Wang, Elias Stengel-Eskin, Adam Kortylewski, Wufei Ma, Benjamin Van Durme, Alan Yuille
    CVPR, 2023 (Highlight, 10% of accepted papers)
    arXiv / Code

    We introduce Super-CLEVR, where different factors in VQA domain shifts can be isolated in order that their effects can be studied independently. We propose probabilistic NSVQA (P-NSVQA), which extends NSVQA with uncertainty reasoning.

    Robust Category-Level 6D Pose Estimation with Coarse-to-Fine Rendering of Neural Features
    Wufei Ma, Angtian Wang, Alan Yuille, Adam Kortylewski
    ECCV, 2022
    arXiv / Code

    We introduce a coarse-to-fine optimization strategy that utilizes the rendering process to estimate a sparse set of 6D object proposals, which are subsequently refined with gradient-based optimization.

    ROBIN: A Benchmark for Robustness to Individual Nuisances
    Bingchen Zhao, Shaozuo Yu, Wufei Ma, Mingxin Yu, Shenxiao Mei, Angtian Wang, Ju He, Alan Yuille, Adam Kortylewski
    ECCV, 2022 (Oral Presentation)

    We introduce ROBIN, a benchmark dataset for diagnosing the robustness of 2D and 3D vision algorithms to individual nuisances in real-world images.

    Guided Pluralistic Building Contour Completion
    Xiaowei Zhang, Wufei Ma, Gunder Varinlioglu, Nick Rauh, Liu He, Daniel Aliaga
    The Visual Computer, 2022

    We present a novel pluralistic building contour completion framework, which uses an entropy-based model to request information from the user for the next most informative location in the image.

    Making Group Decisions from Natural Language-Based Preferences
    Farhad Moshin, Lei Luo, Wufei Ma, Inwon Kang, Zhibing Zhao, Ao Liu, Rohit Vaish, Lirong Xia
    COMSOC, 2021
    PDF / COMSOC, 2021

    We propose a framework for making group decisions from natural language-based preferences. Experiments on the real world data confirms the efficacy of our method.

    Image-Driven Discriminative and Generative Machine Learning Algorithms for Establishing Microstructure-Processing Relationships
    Wufei Ma, Elizabeth Kautz, Arun Baskaran, Aritra Chowdhury, Vineet Joshi, Bülent Yener, Daniel Lewis
    Journal of Applied Physics, 2020
    Project page / PDF / arXiv / AIP / Summary

    An Image-Driven Machine Learning Approach to Kinetic Modeling of a Discontinuous Precipitation Reaction
    Elizabeth Kautz*, Wufei Ma*, Saumyadeep Jana, Arun Devaraj, Vineet Joshi, Bülent Yener, Daniel Lewis
    (* denotes equal contribution)
    Materials Characterization, 2020
    PDF / Code / arXiv / ScienceDirect / Summary

    The Adoption of Image-Driven Machine Learning for Microstructure Characterization and Materials Design: A Perspective
    Arun Baskaran, Elizabeth Kautz, Aritra Chowdhary, Wufei Ma, Bülent Yener, Daniel Lewis
    Preprint, 2021
    PDF / arXiv / Summary

    Academic Service

    AROW @ ECCV 2022, Pre-training Workshop @ ICML 2022, NeurIPS 2022, CVPR 2022, ICLR 2022, CVPR 2023, ICML 2023, NeurIPS 2023, WACV 2024, ICLR 2024


    CS661 - Computer Vision Johns Hopkins University, Spring 2024
    Graduate Course Assistant

    CS661 - Computer Vision Johns Hopkins University, Fall 2023
    Graduate Course Assistant

    CS471/671 - NLP: Self-supervised Models Johns Hopkins University, Spring 2023
    Graduate Course Assistant

    CS182 - Foundations of Computer Science Purdue University, Fall 2021
    Graduate Teaching Assistant

    Affiliations (current and previous)

    Copyright © 2017-21 Wufei Ma. Theme modified from Jon Barron's webpage.