Scholar - SciOpen

During software development, developers tend to tangle multiple concerns into a single commit, resulting in many composite commits. This paper studies the problem of detecting and untangling composite commits, so as to improve the maintainability and understandability of software. Our approach is built upon the observation that both the textual content of code statements and the dependencies between code statements are helpful in comprehending the code commit. Based on this observation, we first construct an attributed graph for each commit, where code statements and various code dependencies are modeled as nodes and edges, respectively, and the textual bodies of code statements are maintained as node attributes. Based on the attributed graph, we propose graph-based learning algorithms that first detect whether the given commit is a composite commit, and then untangle the composite commit into atomic ones. We evaluate our approach on nine C# projects, and the results demonstrate the effectiveness and efficiency of our approach.

Regular Paper Issue

Bug Triaging Based on Tossing Sequence Modeling

Sheng-Qu Xi, Yuan Yao, Xu-Sheng Xiao, Feng Xu, Jian Lv

Journal of Computer Science and Technology 2019, 34(5): 942-956

Published: 06 September 2019

Abstract Collect Collected

Bug triaging, which routes the bug reports to potential fixers, is an integral step in software development and maintenance. To make bug triaging more efficient, many researchers propose to adopt machine learning and information retrieval techniques to identify some suitable fixers for a given bug report. However, none of the existing proposals simultaneously take into account the following three aspects that matter for the efficiency of bug triaging: 1) the textual content in the bug reports, 2) the metadata in the bug reports, and 3) the tossing sequence of the bug reports. To simultaneously make use of the above three aspects, we propose ITRIAGE which first adopts a sequence-to-sequence model to jointly learn the features of textual content and tossing sequence, and then uses a classification model to integrate the features from textual content, metadata, and tossing sequence. Evaluation results on three different open-source projects show that the proposed approach has significantly improved the accuracy of bug triaging compared with the state-of-the-art approaches.

Open Access Issue

A Brief Review of Network Embedding

Yaojing Wang, Yuan Yao, Hanghang Tong, Feng Xu, Jian Lu

Big Data Mining and Analytics 2019, 2(1): 35-47

Published: 15 October 2018

Abstract

PDF (794.5 KB) Collect Collected

Downloads：92

Learning the representations of nodes in a network can benefit various analysis tasks such as node classification, link prediction, clustering, and anomaly detection. Such a representation learning problem is referred to as network embedding, and it has attracted significant attention in recent years. In this article, we briefly review the existing network embedding methods by two taxonomies. The technical taxonomy focuses on the specific techniques used and divides the existing network embedding methods into two stages, i.e., context construction and objective design. The non-technical taxonomy focuses on the problem setting aspect and categorizes existing work based on whether to preserve special network properties, to consider special network types, or to incorporate additional inputs. Finally, we summarize the main findings based on the two taxonomies, analyze their usefulness, and discuss future directions in this area.

Total 3