Real-world blind image super-resolution is a challenging problem due to the absence of target high resolution images for training. Inspired by the recent success of the single image generation based method SinGAN, we tackle this challenging problem with a refined model SR-SinGAN, which can learn to perform single real image super-resolution. Firstly, we empirically find that downsampled LR input with an appropriate size can improve the robustness of the generation model. Secondly, we introduce a global contextual prior to provide semantic information. This helps to remove distorted pixels and improve the output fidelity. Finally, we design an image gradient based local contextual prior to guide detail generation. It can alleviate generated artifacts in smooth areas while preserving rich details in densely textured regions (e.g., hair, grass). To evaluate the effectiveness of these contextual priors, we conducted extensive experiments on both artificial and real images. Results show that these priors can stabilize training and preserve output fidelity, improving the generated image quality. We furthermore find that these single image generation based methods work better for images with repeated textures compared to general images.
- Article type
- Year
- Co-author
Open Access
Research Article
Issue
Open Access
Research Article
Issue
Video colorization is a challenging and highly ill-posed problem. Although recent years have witnessed remarkable progress in single image colorization, there is relatively less research effort on video colorization, and existing methods always suffer from severe flickering artifacts (temporal incon-sistency) or unsatisfactory colorization. We address this problem from a new perspective, by jointly considering colorization and temporal consistency in a unified framework. Specifically, we propose a novel temporally consistent video colorization (TCVC) framework. TCVC effectively propagates frame-level deep features in a bidirectional way to enhance the temporal consistency of colorization. Furthermore, TCVC introduces a self-regularization learning (SRL) scheme to minimize the differences in predictions obtained using different time steps. SRL does not require any ground-truth color videos for training and can further improve temporal consistency. Experiments demonstrate that our method can not only provide visually pleasing colorized video, but also with clearly better temporal consistency than state-of-the-art methods. A video demo is provided at https://www.youtube.com/watch?v=c7dczMs-olE, while code is available at https://github.com/lyh-18/TCVC-Temporally-Consistent-Video-Colorization.
Open Access
Research Article
Issue
Robustness and generalization are two challenging problems for learning point cloud represen-tation. To tackle these problems, we first design a novel geometry coding model, which can effectively use an invariant eigengraph to group points with similar geometric information, even when such points are far from each other. We also introduce a large-scale point cloud dataset, PCNet184. It consists of 184 categories and 51,915 synthetic objects, which brings new challenges for point cloud classification, and provides a new benchmark to assess point cloud cross-domain generalization. Finally, we perform exten-sive experiments on point cloud classification, using ModelNet40, ScanObjectNN, and our PCNet184, and segmentation, using ShapeNetPart and S3DIS. Our method achieves comparable performance to state-of-the-art methods on these datasets, for both supervised and unsupervised learning. Code and our dataset are available at https://github.com/MingyeXu/PCNet184.
Open Access
Research Article
Issue
Recent years have witnessed significant progress in image-based 3D face reconstruction using deep convolutional neural networks. However, current reconstruction methods often perform improperly in self-occluded regions and can lead to inaccurate correspondences between a 2D input image and a 3D face template, hindering use in real applications. To address these problems, we propose a deep shape reconstruction and texture completion network, SRTC-Net, which jointly reconstructs 3D facial geometry and completes texture with correspondences from a single input face image. In SRTC-Net, we leverage the geometric cues from completed 3D texture to reconstruct detailed structures of 3D shapes. The SRTC-Net pipeline has three stages. The first introduces a correspondence network to identify pixel-wise correspondence between the input 2D image and a 3D template model, and transfers the input 2D image to a
京公网安备11010802044758号