Recent advances in Super-Resolution (SR) image reconstruction using Convolutional Neural Networks (CNNs) have encountered significant challenges in effectively modeling the complex mapping between Low-Resolution (LR) and High-Resolution (HR) images. While Generative Adversarial Networks (GANs) have been explored as a potential solution to enhance SR performance, these models often suffer from prolonged training and inference times, and may fail to preserve intricate texture details in the reconstructed images. In response to these limitations, we propose a novel fusion network architecture, termed CLustering and Generative Adversarial Network (CL-GAN), designed to concurrently learn and integrate the features of clustered image segments and low-resolution inputs, thereby enhancing the SR reconstruction process. The CL-GAN framework comprises two primary components: a local network that emphasizes feature extraction from clustered image regions, and a global network built upon a GAN framework to model global image characteristics. To further improve texture recovery, we incorporate dense connection mechanisms within both the local and global networks, facilitating the preservation of fine-grained details in the generated SR images. Extensive experiments conducted on publicly available datasets demonstrate that the proposed CL-GAN framework outperforms existing state-of-the-art methods, delivering superior SR images with enhanced detail fidelity and visual quality.
- Article type
- Year
- Co-author
Open Access
Issue
Open Access
Issue
A moisture advection scheme is an essential module of a numerical weather/climate model representing the horizontal transport of water vapor. The Piecewise Rational Method (PRM) scalar advection scheme in the Global/Regional Assimilation and Prediction System (GRAPES) solves the moisture flux advection equation based on PRM. Computation of the scalar advection involves boundary exchange, and computation of higher bandwidth requirements is complicated and time-consuming in GRAPES. Recently, Graphics Processing Units (GPUs) have been widely used to solve scientific and engineering computing problems owing to advancements in GPU hardware and related programming models such as CUDA/OpenCL and Open Accelerator (OpenACC). Herein, we present an accelerated PRM scalar advection scheme with Message Passing Interface (MPI) and OpenACC to fully exploit GPUs’ power over a cluster with multiple Central Processing Units (CPUs) and GPUs, together with optimization of various parameters such as minimizing data transfer, memory coalescing, exposing more parallelism, and overlapping computation with data transfers. Results show that about 3.5 times speedup is obtained for the entire model running at medium resolution with double precision when comparing the scheme’s elapsed time on a node with two GPUs (NVIDIA P100) and two 16-core CPUs (Intel Gold 6142). Further, results obtained from experiments of a higher resolution model with multiple GPUs show excellent scalability.
Open Access
Issue
A Weighted Essentially Non-Oscillatory scheme (WENO) is a solution to hyperbolic conservation laws, suitable for solving high-density fluid interface instability with strong intermittency. These problems have a large and complex flow structure. To fully utilize the computing power of High Performance Computing (HPC) systems, it is necessary to develop specific methodologies to optimize the performance of applications based on the particular system’s architecture. The Sunway TaihuLight supercomputer is currently ranked as the fastest supercomputer in the world. This article presents a heterogeneous parallel algorithm design and performance optimization of a high-order WENO on Sunway TaihuLight. We analyzed characteristics of kernel functions, and proposed an appropriate heterogeneous parallel model. We also figured out the best division strategy for computing tasks, and implemented the parallel algorithm on Sunway TaihuLight. By using access optimization, data dependency elimination, and vectorization optimization, our parallel algorithm can achieve up to 172× speedup on one single node, and additional 58× speedup on 64 nodes, with nearly linear scalability.
京公网安备11010802044758号