Open Access Issue
SmartEagleEye: A Cloud-Oriented Webshell Detection System Based on Dynamic Gray-Box and Deep Learning
Tsinghua Science and Technology 2024, 29 (3): 766-783
Published: 04 December 2023

Compared with traditional environments, the cloud environment exposes online services to additional vulnerabilities and threats of cyber attacks, and the cyber security of cloud platforms is becoming increasingly prominent. A piece of code, known as a Webshell, is usually uploaded to the target servers to achieve multiple attacks. Preventing Webshell attacks has become a hot spot in current research. Moreover, the traditional Webshell detectors are not built for the cloud, making it highly difficult to play a defensive role in the cloud environment. SmartEagleEye, a Webshell detection system based on deep learning that is successfully applied in various scenarios, is proposed in this paper. This system contains two important components: gray-box and neural network analyzers. The gray-box analyzer defines a series of rules and algorithms for extracting static and dynamic behaviors from the code to make the decision jointly. The neural network analyzer transforms suspicious code into Operation Code (OPCODE) sequences, turning the detection task into a classification problem. Comprehensive experiment results show that SmartEagleEye achieves an encouraging high detection rate and an acceptable false-positive rate, which indicate its capability to provide good protection for the cloud environment.

Open Access Issue
A Tibetan Sentence Boundary Disambiguation Model Considering the Components on Information on Both Sides of Shad
Tsinghua Science and Technology 2023, 28 (6): 1085-1100
Published: 28 July 2023

Sentence Boundary Disambiguation (SBD) is a preprocessing step for natural language processing. Segmenting text into sentences is essential for Deep Learning (DL) and pretraining language models. Tibetan punctuation marks may involve ambiguity about the sentences’ beginnings and endings. Hence, the ambiguous punctuation marks must be distinguished, and the sentence structure must be correctly encoded in language models. This study proposed a component-level Tibetan SBD approach based on the DL model. The models can reduce the error amplification caused by word segmentation and part-of-speech tagging. Although most SBD methods have only considered text on the left side of punctuation marks, this study considers the text on both sides. In this study, 465 669 Tibetan sentences are adopted, and a Bidirectional Long Short-Term Memory (Bi-LSTM) model is used to perform SBD. The experimental results show that the F1-score of the Bi-LSTM model reached 96 %, the most efficient among the six models. Experiments are performed on low-resource languages such as Turkish and Romanian, and high-resource languages such as English and German, to verify the models’ generalization.

Open Access Issue
LETRNG — A Lightweight and Efficient True Random Number Generator for GNU/Linux Systems
Tsinghua Science and Technology 2023, 28 (2): 370-385
Published: 29 September 2022

Unpredictable and irreproducible digital keys are required to modulate security-related information in secure communication systems. True random number generators (TRNGs) rather than pseudorandom number generators (PRNGs) are required for the highest level of security. TRNG is a significant component in the digital security realm for extracting unpredictable binary bitstreams. Presently, most TRNGs extract high-quality "noise" from unpredictable physical random phenomena. Thus, these applications must be equipped with external hardware for collecting entropy and converting them into a random digital sequence. This study introduces a lightweight and efficient true random number generator (LETRNG) that uses the inherent randomness of a central processing unit (CPU) and an operating system (OS) as the source of entropy. We then utilize a lightweight post-processing method based on XOR and fair coin operation to generate an unbiased random binary sequence. Evaluations based on two famous test suites (NIST and ENT) show that LETRNG is perfectly capable of generating high-quality random numbers suitable for various GNU/Linux systems.

Open Access Issue
PointGAT: Graph attention networks for 3D object detection
Intelligent and Converged Networks 2022, 3 (2): 204-216
Published: 06 September 2022

3D object detection is a critical technology in many applications, and among the various detection methods, pointcloud-based methods have been the most popular research topic in recent years. Since Graph Neural Network (GNN) is considered to be effective in dealing with pointclouds, in this work, we combined it with the attention mechanism and proposed a 3D object detection method named PointGAT. Our proposed PointGAT outperforms previous approaches on the KITTI test dataset. Experiments in real campus scenarios also demonstrate the potential of our method for further applications.

Open Access Issue
Analysis on the development status of intelligent and connected vehicle test site
Intelligent and Converged Networks 2021, 2 (4): 320-333
Published: 30 December 2021

With the development of automobile intelligence and connectivity, Intelligent and Connected Vehicle (ICV) is an inevitable trend in the transformation and upgrading of the automotive industry. The maturity of any advanced technology is inseparable from a large number of test verifications, especially the research and application of automotive technology require a large number of reliable tests for evaluation and confirmation. Therefore, the ICV Test Site (ICVTS) will become a key deployment area. In this paper, we analyze the development status of ICVTS outside and within China, summarize the shortcomings of the existing test sites, and put forward some targeted suggestions, in an effort to guide the development and construction of ICVTS towards the path that seems to be most promising.

Open Access Issue
Machine Knowledge and Human Cognition
Big Data Mining and Analytics 2020, 3 (4): 292-299
Published: 16 November 2020

Intelligent machines are knowledge systems with unique knowledge structure and function. In this paper, we discuss issues including the characteristics and forms of machine knowledge, the relationship between knowledge and human cognition, and the approach to acquire machine knowledge. These issues are of great significance to the development of artificial intelligence.

Open Access Issue
A Deep Learning Method for Chinese Singer Identification
Tsinghua Science and Technology 2019, 24 (4): 371-378
Published: 07 March 2019

As a subfield of Multimedia Information Retrieval (MIR), Singer IDentification (SID) is still in the research phase. On one hand, SID cannot easily achieve high accuracy because the singing voice is difficult to model and always disturbed by the background instrumental music. On the other hand, the performance of conventional machine learning methods is limited by the scale of the training dataset. This study proposes a new deep learning approach based on Long Short-Term Memory (LSTM) and Mel-Frequency Cepstral Coefficient (MFCC) features to identify the singer of a song in large datasets. The results of this study indicate that LSTM can be used to build a representation of the relationships between different MFCC frames. The experimental results show that the proposed method achieves better accuracy for Chinese SID in the MIR-1K dataset than the traditional approaches.

total 7