Sort:
Open Access Issue
Classification on Grade, Price, and Region with Multi-Label and Multi-Target Methods in Wineinformatics
Big Data Mining and Analytics 2020, 3 (1): 1-12
Published: 19 December 2019
Downloads:36

Classifying wine according to their grade, price, and region of origin is a multi-label and multi-target problem in wineinformatics. Using wine reviews as the attributes, we compare several different multi-label/multi-target methods to the single-label method where each label is treated independently. We explore both single-label and multi-label approaches for a two-class problem for each of the labels and we explore both single-label and multi-target approaches for a four-class problem on two of the three labels, with the third label remaining a two-class problem. In terms of per-label accuracy, the single-label method has the best performance, although some multi-label methods approach the performance of single-label. However, multi-label/multi-target metrics approaches do exceed the performance of the single-label method.

Open Access Issue
Hierarchically Clustered HMM for Protein Sequence Motif Extraction with Variable Length
Tsinghua Science and Technology 2014, 19 (6): 635-647
Published: 20 November 2014
Downloads:8

Protein sequence motifs extraction is an important field of bioinformatics since its relevance to the structural analysis. Two major problems are related to this field: (1) searching the motifs within the same protein family; and (2) assuming a window size for the motifs search. This work proposes the Hierarchically Clustered Hidden Markov Model (HC-HMM) approach, which represents the behavior and structure of proteins in terms of a Hidden Markov Model chain and hierarchically clusters each chain by minimizing distance between two given chains’ structure and behavior. It is well known that HMM can be utilized for clustering, however, methods for clustering on Hidden Markov Models themselves are rarely studied. In this paper, we developed a hierarchical clustering based algorithm for HMMs to discover protein sequence motifs that transcend family boundaries with no assumption on the length of the motif. This paper carefully examines the effectiveness of this approach for motif extraction on 2593 proteins that share no more than 25% sequence identity. Many interesting motifs are generated. Three example motifs generated by the HC-HMM approach are analyzed and visualized with their tertiary structure. We believe the proposed method provides a unique protein sequence motif extraction strategy. The related data mining fields using Hidden Markova Model may also benefit from this clustering on HMM themselves approach.

total 2