Scholar - SciOpen

Hierarchical multi-granularity image classification is a challenging task that aims to tag each given image with multiple granularity labels simultaneously. Existing methods tend to overlook that different image regions contribute differently to label prediction at different granularities, and also insufficiently consider relationships between the hierarchical multi-granularity labels. We introduce a sequence-to-sequence mechanism to overcome these two problems and propose a multi-granularity sequence generation (MGSG) approach for the hierarchical multi-granularity image classification task. Specifically, we introduce a transformer architecture to encode the image into visual representation sequences. Next, we traverse the taxonomic tree and organize the multi-granularity labels into sequences, and vectorize them and add positional information. The proposed multi-granularity sequence generation method builds a decoder that takes visual representation sequences and semantic label embedding as inputs, and outputs the predicted multi-granularity label sequence. The decoder models dependencies and correlations between multi-granularity labels through a masked multi-head self-attention mechanism, and relates visual information to the semantic label information through a cross-modality attention mechanism. In this way, the proposed method preserves the relationships between labels at different granularity levels and takes into account the influence of different image regions on labels with different granularities. Evaluations on six public benchmarks qualitatively and quantitatively demonstrate the advantages of the proposed method. Our project is available at https://github.com/liuxindazz/mgsg.

Open Access Review Article Issue

Foveated rendering: A state-of-the-art survey

Lili Wang, Xuehuai Shi, Yi Liu

Computational Visual Media 2023, 9 (2): 195-228

Published: 03 January 2023

Abstract

PDF (7 MB)

Download citation

GB/T 7714-2015

EndNote(RIS)

BibTeX

NoteExpress

Refworks

Collect Collected

Downloads：314

Recently, virtual reality (VR) technology has been widely used in medical, military, manufac-turing, entertainment, and other fields. These app-lications must simulate different complex material surfaces, various dynamic objects, and complex physical phenomena, increasing the complexity of VR scenes. Current computing devices cannot efficiently render these complex scenes in real time, and delayed rendering makes the content observed by the user inconsistent with the user’s interaction, causing discomfort. Foveated rendering is a promising technique that can accelerate rendering. It takes advantage of human eyes’ inherent features and renders different regions with different qualities without sacrificing perceived visual quality. Foveated rendering research has a history of 31 years and is mainly focused on solving the following three problems. The first is to apply perceptual models of the human visual system into foveated rendering. The second is to render the image with different qualities according to foveation principles. The third is to integrate foveated rendering into existing rendering paradigms to improve rendering performance. In this survey, we review foveated rendering research from 1990 to 2021. We first revisit the visual perceptual models related to foveated rendering. Subsequently, we propose a new foveated rendering taxonomy and then classify and review the research on this basis. Finally, we discuss potential opportunities and open questions in the foveated rendering field. We anticipate that this survey will provide new researchers with a high-level overview of the state-of-the-art in this field, furnish experts with up-to-date information, and offer ideas alongside a framework to VR display software and hardware designers and engineers.

Open Access Research Article Issue

AR assistance for efficient dynamic target search

Zixiang Zhao, Jian Wu, Lili Wang

Computational Visual Media 2023, 9 (1): 177-194

Published: 18 October 2022

Abstract

PDF (5.6 MB)

Download citation

GB/T 7714-2015

EndNote(RIS)

BibTeX

NoteExpress

Refworks

Collect Collected

Downloads：34

When searching for a dynamic target in an unknown real world scene, search efficiency is greatly reduced if users lack information about the spatial structure of the scene. Most target search studies, especially in robotics, focus on determining either the shortest path when the target’s position is known, or a strategy to find the target as quickly as possible when the target’s position is unknown. However, the target’s position is often known intermittently in the real world, e.g., in the case of using surveillance cameras. Our goal is to help user find a dynamic target efficiently in the real world when the target’s position is intermittently known. In order to achieve this purpose, we have designed an AR guidance assistance system to provide optimal current directional guidance to users, based on searching a prediction graph. We assume that a certain number of depth cameras are fixed in a real scene to obtain dynamic target’s position. The system automatically analyzes all possible meetings between the user and the target, and generates optimal directional guidance to help the user catch up with the target. A user study was used to evaluate our method, and its results showed that compared to free search and a top-view method, our method significantly improves target search efficiency.

total 3