Amodal Instance Segmentation with KINS Dataset

Jan 2022

Wufei Ma
Purdue University

Abstract

Paper reading notes for Amodal Instance Segmentation with KINS Dataset [1].

Amodal instance segmentation aims to segment each object instance involving its invisible, occluded parts to imitate human ability. This task requires to reason objects' complex structure. In this paper, the authors augmented KITTI with more instance pixel-level annotation for 8 categories, called KITTI INStance dataset (KINS). They also proposed to reason invisible parts via a new multi-task framework with Multi-Level Coding (MLC). Experiments showed that MLC effectively improves both amodal and inmodal segmentation.

KINS: Amodal Instance Dataset

The authors annotated a total of 14,491 images from KITTI to form a large-scale amodal instance dataset. The dataset is split into two parts where 7,474 images are used for training and 7,517 for testing. The annotations include amodal instance masks, semantic labels, and relative occlusion order.

Dataset statistics. On average, each image has 12.53 labeled instances, and each object polygon consits of 33.70 points. Of all regions, 53.6% are partially occluded and the average occlusion ratio is 31.7%.

Semantic labels. General categories in KINS consits of 'people' and 'vehicle'. The general categories are further divided into:

people (14.43%): pedestrian (10.56%), cyclist (2.69%), person-siting (1.18%)
vehicle (85.57%): car (67.76%), tram (1.09%), truck (0.92%), van (5.93%), misc (9.87%)

Occlusion level. Compared with COCO Amodal Dataset, heavy occlusion is more common in KINS.

Examples of the amodal/inmodal masks. The digits indicate the relative occlusion order.

Amodal Segmentation Network

Apart from similar structures of Mask R-CNN, amodal segmentation network consists of an occlusion classification branch and Multi-Level Coding (MLC).

Occlusion classification branch. The occlusion classification branch is used to determine whether occlusion exists in a RoI. Regions with overlapping area larger than 5% of the total mask are treated as occlusion samples. The weight loss of positive samples are set to 8 considering the extreme imbalance between occlusion and non-occlusion samples.

Multi-level coding. To predict the occlusion part, multi-level coding (MLC) is proposed to amplify the global information in mask prediction. MLC consists of two modules, extraction and combination. The extraction module incorporates category and occlusion information into a global feature. Then the combination part fuses the global feature and local mask feature to help segment complete instances.

Results

Comparison of the proposed model and previous works.

References

[1] L. Qi, L. Jiang, S. Liu, X. Shen, J. Jia. Amodal Instance Segmentation With KINS Dataset. In CVPR, 2019.