I am a PhD student at Johns Hopkins University, advised by Bloomberg Distinguished Professor Dr. Alan Yuille.
I obtained my B.S. with summa cum laude honor from Rensselaer Polytechnic Institute in 2020 and I had a double major in Computer Science and Mathematics. During my undergraduate years, I had worked with Prof. Bülent Yener on discriminative and generative models for microstructure images and with Prof. Lirong Xia on preference learning from natural language.
I've spent time at Meta Reality Labs, Microsoft Research Asia, AWS CV Science, and Megvii Research as a research intern.
Email  / 
CV  / 
Instagram
|
|
News
Sep 2023 Codebase for neural mesh models released here.
Jul 2023 One paper accepted by ICCV 2023.
Jun 2023 ICCV 2023 OOD-CV challenge released.
Apr 2023 Program committee of CVPR 2023 Workshop - Generative Models for Computer Vision.
Mar 2023 I will co-organize the OOD Generalization in Computer Vision Workshop at ICCV 2023.
Mar 2023 One paper accepted to CVPR 2023 and selected as highlight.
|
Publications
|
|
Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape
Jiacong Xu, Yi Zhang, Jiawei Peng, Wufei Ma, ..., Alan Yuille, Adam Kortylewski
ICCV, 2023
Project Page
/
arXiv
/
Code
Animal3D consists of 3379 images collected from 40 mammal species, high-quality annotations of 26 keypoints, and importantly the pose and shape parameters of the SMAL model. We demonstrate that synthetic pre-training is a viable strategy to boost the model performance.
|
|
Adding 3D Geometry Control to Diffusion Models
Wufei Ma*, Qihao Liu*, Jiahao Wang*, Angtian Wang, Yaoyao Liu, Adam Kortylewski, Alan Yuille
(* denotes equal contribution)
arXiv, 2023
arXiv
Diffusion models have emerged as a powerful method of generative modeling across a range of fields, capable of producing stunning photo-realistic images from natural language descriptions. However, these models lack explicit control over the 3D structure of the objects in the generated images. In this paper, we propose a novel method that incorporates 3D geometry control into diffusion models, making them generate even more realistic and diverse images. To achieve this, our method exploits ControlNet, which extends diffusion models by using visual prompts in addition to text prompts. We generate images of 3D objects taken from a 3D shape repository (e.g., ShapeNet and Objaverse), render them from a variety of poses and viewing directions, compute the edge maps of the rendered images, and use these edge maps as visual prompts to generate realistic images. With explicit 3D geometry control, we can easily change the 3D structures of the objects in the generated images and obtain ground-truth 3D annotations automatically. This allows us to use the generated images to improve a lot of vision tasks, e.g., classification and 3D pose estimation, in both in-distribution (ID) and out-of-distribution (OOD) settings.
|
|
Neural Textured Deformable Meshes for Robust Analysis-by-Synthesis
Angtian Wang*, Wufei Ma*, Alan Yuille, Adam Kortylewski
(* denotes equal contribution)
arXiv, 2023
arXiv
Obtaining accurate 3D object poses is vital for numerous computer vision applications, such as 3D reconstruction and scene understanding. However, annotating real-world objects is time-consuming and challenging. While synthetically generated training data is a viable alternative, the domain shift between real and synthetic data is a significant challenge. In this work, we aim to narrow the performance gap between models trained on synthetic data and few real images and fully supervised models trained on large-scale data. We achieve this by approaching the problem from two perspectives: 1) We introduce SyntheticP3D, a new synthetic dataset for object pose estimation generated from CAD models and enhanced with a novel algorithm. 2) We propose a novel approach (CC3D) for training neural mesh models that perform pose estimation via inverse rendering. In particular, we exploit the spatial relationships between features on the mesh surface and a contrastive learning scheme to guide the domain adaptation process. Combined, these two approaches enable our models to perform competitively with state-of-the-art models using only 10% of the respective real training images, while outperforming the SOTA model by 10.4% with a threshold of pi/18 using only 50% of the real training data. Our trained model further demonstrates robust generalization to out-of-distribution scenarios despite being trained with minimal real data.
|
|
Robust Category-Level 3D Pose Estimation from Synthetic Data
Jiahao Yang, Wufei Ma, Angtian Wang, Xiaoding Yuan, Adam Kortylewski, Alan Yuille
arXiv, 2023
arXiv
/
Summary
Obtaining accurate 3D object poses is vital for numerous computer vision applications, such as 3D reconstruction and scene understanding. However, annotating real-world objects is time-consuming and challenging. While synthetically generated training data is a viable alternative, the domain shift between real and synthetic data is a significant challenge. In this work, we aim to narrow the performance gap between models trained on synthetic data and few real images and fully supervised models trained on large-scale data. We achieve this by approaching the problem from two perspectives: 1) We introduce SyntheticP3D, a new synthetic dataset for object pose estimation generated from CAD models and enhanced with a novel algorithm. 2) We propose a novel approach (CC3D) for training neural mesh models that perform pose estimation via inverse rendering. In particular, we exploit the spatial relationships between features on the mesh surface and a contrastive learning scheme to guide the domain adaptation process. Combined, these two approaches enable our models to perform competitively with state-of-the-art models using only 10% of the respective real training images, while outperforming the SOTA model by 10.4% with a threshold of pi/18 using only 50% of the real training data. Our trained model further demonstrates robust generalization to out-of-distribution scenarios despite being trained with minimal real data.
|
|
OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images
Bingchen Zhao, Jiahao Wang, Wufei Ma, Artur Jesslen, Siwei Yang, Shaozuo Yu, Oliver Zendel, Christian Theobalt, Alan Yuille, Adam Kortylewski
arXiv, 2023
arXiv
|
|
Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning
Zhuowan Li, Xingrui Wang, Elias Stengel-Eskin, Adam Kortylewski, Wufei Ma, Benjamin Van Durme, Alan Yuille
CVPR, 2023 (Highlight, 10% of accepted papers)
arXiv
/
Code
/
Summary
Visual Question Answering (VQA) models often perform poorly on out-of-distribution data and struggle on domain generalization. Due to the multi-modal nature of this task, multiple factors of variation are intertwined, making generalization difficult to analyze. This motivates us to introduce a virtual benchmark, Super-CLEVR, where different factors in VQA domain shifts can be isolated in order that their effects can be studied independently. Four factors are considered: visual complexity, question redundancy, concept distribution and concept compositionality. With controllably generated data, Super-CLEVR enables us to test VQA methods in situations where the test data differs from the training data along each of these axes. We study four existing methods, including two neural symbolic methods NSCL and NSVQA, and two non-symbolic methods FiLM and mDETR; and our proposed method, probabilistic NSVQA (P-NSVQA), which extends NSVQA with uncertainty reasoning. P-NSVQA outperforms other methods on three of the four domain shift factors. Our results suggest that disentangling reasoning and perception, combined with probabilistic uncertainty, form a strong VQA model that is more robust to domain shifts
|
|
Robust Category-Level 6D Pose Estimation with Coarse-to-Fine Rendering of Neural Features
Wufei Ma, Angtian Wang, Alan Yuille, Adam Kortylewski
ECCV, 2022
arXiv
/
Code
/
Summary
We consider the problem of category-level 6D pose estimation from a single RGB image. Our approach represents an object category as a cuboid mesh and learns a generative model of the neural feature activations at each mesh vertex to perform pose estimation through differentiable rendering. A common problem of rendering-based approaches is that they rely on bounding box proposals, which do not convey information about the 3D rotation of the object and are not reliable when objects are partially occluded. Instead, we introduce a coarse-to-fine optimization strategy that utilizes the rendering process to estimate a sparse set of 6D object proposals, which are subsequently refined with gradient-based optimization.
|
|
ROBIN: A Benchmark for Robustness to Individual Nuisances
Bingchen Zhao, Shaozuo Yu, Wufei Ma, Mingxin Yu, Shenxiao Mei, Angtian Wang, Ju He, Alan Yuille, Adam Kortylewski
ECCV, 2022 (Oral Presentation)
arXiv
/
Summary
In this work, we introduce ROBIN, a benchmark dataset for diagnosing the robustness of vision algorithms to individual nuisances in real-world images. We provide results for a number of popular baselines and make several interesting observations. We believe our dataset provides a rich testbed to study the OOD robustness of vision algorithms and will help to significantly push forward research in this area.
|
|
Guided Pluralistic Building Contour Completion
Xiaowei Zhang, Wufei Ma, Gunder Varinlioglu, Nick Rauh, Liu He, Daniel Aliaga
The Visual Computer, 2022
Springer
/
Summary
Image/sketch completion is a core task that addresses the problem of completing the missing regions of an image/sketch with realistic and semantically consistent content. We address one type of completion which is producing a tentative completion of an aerial view of the remnants of a building structure. The inference process may start with as little as 10% of the structure and thus is fundamentally pluralistic (e.g., multiple completions are possible). We present a novel pluralistic building contour completion framework. A feature suggestion component uses an entropy-based model to request information from the user for the next most informative location in the image. Then, an image completion component trained using self-supervision and procedurally generated content produces a partial or full completion. In our synthetic and real-world experiments for archaeological sites in Turkey, with up to only 4 iterations, we complete building footprints having only 10–15% of the ancient structure initially visible. We also compare to various state-of-the-art methods and show our superior quantitative/qualitative performance. While we show results for archaeology, we anticipate our method can be used for restoring highly incomplete historical sketches and for modern day urban reconstruction despite occlusions.
|
|
Deep Learning-Based Video Compression
Under Review
Research project at Microsoft Research Asia supervised by Dr. Bin Li and Dr. Jiahao Li.
|
|
Making Group Decisions from Natural Language-Based Preferences
Farhad Moshin, Lei Luo, Wufei Ma, Inwon Kang, Zhibing Zhao, Ao Liu, Rohit Vaish, Lirong Xia
COMSOC, 2021
PDF
/
COMSOC, 2021
/
Summary
We propose a framework for making group decisions from natural language-based preferences. Experiments on the real world data confirms the efficacy of our method.
|
|
Image-Driven Discriminative and Generative Machine Learning Algorithms for Establishing Microstructure-Processing Relationships
Wufei Ma, Elizabeth Kautz, Arun Baskaran, Aritra Chowdhury, Vineet Joshi, Bülent Yener, Daniel Lewis
Journal of Applied Physics, 2020
Project page
/
PDF
/
arXiv
/
AIP
/
Summary
Characterize 10 different microstructure representations with image texture features and quantitative metrics from image segmentation. For the microstructure generation task, two schemes are considered: 1) generating high-resolution (1024x1024) microstructure images from random noise; and 2) train a style transfer GAN for image generation conditioned on the segmentation label.
|
|
An Image-Driven Machine Learning Approach to Kinetic Modeling of a Discontinuous Precipitation Reaction
Elizabeth Kautz*, Wufei Ma*, Saumyadeep Jana, Arun Devaraj, Vineet Joshi, Bülent Yener, Daniel Lewis
(* denotes equal contribution)
Materials Characterization, 2020
PDF
/
Code
/
arXiv
/
ScienceDirect
/
Summary
Kinetic modeling of a discontinuous precipitation reaction (5 phases) by 1) deep learning with CNN, and 2) performing image segmentation of various microstructure and quantizing the area fractions.
|
|
The Adoption of Image-Driven Machine Learning for Microstructure Characterization and Materials Design: A Perspective
Arun Baskaran, Elizabeth Kautz, Aritra Chowdhary, Wufei Ma, Bülent Yener, Daniel Lewis
Preprint, 2021
PDF
/
arXiv
/
Summary
We first review the application of image-driven machine learning approaches to the field of materials characterization. Then we analyze and discuss the impact of various approaches at each step of the experiments.
|
Academic Service
Reviewer
AROW @ ECCV 2022, Pre-training Workshop @ ICML 2022, NeurIPS 2022, CVPR 2022, ICLR 2022, CVPR 2023, ICML 2023, NeurIPS 2023, WACV 2024, ICLR 2024
|
Teaching
CS661 - Computer Vision Johns Hopkins University, Fall 2023
Graduate Course Assistant
CS471/671 - NLP: Self-supervised Models Johns Hopkins University, Spring 2023
Graduate Course Assistant
CS182 - Foundations of Computer Science Purdue University, Fall 2021
Graduate Teaching Assistant
|
Affiliations (current and previous)
|
|
|
Copyright © 2017-21 Wufei Ma. Theme modified from Jon Barron's webpage.
|
|