AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (5.6 MB)
Submit Manuscript AI Chat Paper
Show Outline
Show full outline
Hide outline
Show full outline
Hide outline
Open Access | Just Accepted

MU-Net-optLSTM: Two-Stream Spatial–Temporal Feature Extraction and Classification Architecture for Automatic Monitoring of Crowded Art Museums

Mukun Wang1,2Rongju Yao3Khosro Rezaee4( )

1 Department of Design, Graduate School, Dongseo University, Busan 47011, Republic of Korea

2 Tianshui Normal University, Tianshui 741000, China

3 Shandong Provincial University Laboratory for Protected Horticulture, Weifang University of Science and Technology, Weifang 262700, China

4 Department of Biomedical Engineering, Meybod University, yazd 8961699557, Iran

Show Author Information


Networked cameras that continuously capture video data have generated a high demand for hybrid edge-to-cloud servers that can process live videos in real time. The environment of art museums is rarely studied, but visual analysis is an important factor in categorizing and distinguishing individuals and crowds through smart surveillance systems. This paper demonstrates how video surveillance data from art museums can be analyzed to identify abnormal behavior using an innovative deep learning framework. To enhance the extracted features, a spatial feature extraction method based on the U-Net architecture is applied, along with the encoder component of the proposed approach, MobileNetV2. Additionally, we propose an improved Long-Short-Term Memory (LSTM) algorithm for extracting temporal features. Optical flow enhances surveillance in art museums by tracking individuals and crowds. Our approach yields an average accuracy of 97.67±1.23% when applied to a collection of video datasets. Using U-Net, MobileNetV2, and optimized LSTM algorithms, the model recognizes patterns in video data, such as crowd motion in museums. Consequently, this methodology generates reliable results as well as being computationally efficient. Compared to the state-of-the-art, the proposed method is more comprehensive and generalizable for analyzing atypical museum visitor behavior.

Tsinghua Science and Technology
Cite this article:
Wang M, Yao R, Rezaee K. MU-Net-optLSTM: Two-Stream Spatial–Temporal Feature Extraction and Classification Architecture for Automatic Monitoring of Crowded Art Museums. Tsinghua Science and Technology, 2024,








Web of Science






Available online: 12 June 2024

© The author(s) 2025.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (