Scholar - SciOpen

This paper presents a multi-task gradual inference model, MTGINet, for automatic portrait matting. It handles the subtasks of automatic portrait matting, namely portrait–transition–background trimap segmentation and transition region matting, with a single encoder–decoder structure. First, we enrich the highest stage of features from the encoder with portrait shape context via a shape context aggregation (SCA) module for trimap segmentation. Then, we fuse the SCA-enhanced features with detailed clues from the encoder for transition-region-aware alpha matting. The gradual inference model naturally allows sufficient interaction between the subtasks via forward computation and backwards propagation during training, and therefore achieves high accuracy while maintaining low complexity. In addition, considering the discrepancies in feature requirements across subtasks, we adapt the features from the encoders before reusing them via a feature rectification module. In addition to the MTGINet model, we have constructed a new large-scale dataset, HPM-17K, for half-body portrait matting. It consists of 16,967 images with diverse backgrounds. Comparative experiments with existing deep models on the public P3M-10K dataset and our HPM-17K dataset demonstrate that the proposed model exhibits state-of-the-art performance.