A Feature Selection Method for Prediction Essential Protein

Jiancheng Zhong; Jianxin Wang; Wei Peng; Zhen Zhang; Min Li

doi:10.1109/TST.2015.7297748

| Sign up

PDF (428 KB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Figures (2)

Fig. 1

Tables (4)

Table 1

Table 2

Table 3

Table 4

Open Access

A Feature Selection Method for Prediction Essential Protein

Jiancheng Zhong, Jianxin Wang(), Wei Peng, Zhen Zhang, Min Li

School of Information Science and Engineering, Central South University, Changsha 410083, China.

College of Polytechnic, Hunan Normal University, Changsha 410083, China.

Computer Center, Kunming University of Science and Technology, Kunming 650093, China.

Show Author Information

Abstract

Essential proteins are vital to the survival of a cell. There are various features related to the essentiality of proteins, such as biological and topological features. Many computational methods have been developed to identify essential proteins by using these features. However, it is still a big challenge to design an effective method that is able to select suitable features and integrate them to predict essential proteins. In this work, we first collect 26 features, and use SVM-RFE to select some of them to create a feature space for predicting essential proteins, and then remove the features that share the biological meaning with other features in the feature space according to their Pearson Correlation Coefficients (PCC). The experiments are carried out on S. cerevisiae data. Six features are determined as the best subset of features. To assess the prediction performance of our method, we further compare it with some machine learning methods, such as SVM, Naive Bayes, Bayes Network, and NBTree when inputting the different number of features. The results show that those methods using the 6 features outperform that using other features, which confirms the effectiveness of our feature selection method for essential protein prediction.

Keywords

essential protein feature selection Protein-Protein Interaction (PPI)machine learning centrality algorithm

References

[1]

Kamath

R. S.

, Fraser

A. G.

, Dong

, Poulin

, Durbin

, Gotta

, Kanapin

, Le Bot

, Moreno

, and Sohrmann

, Systematic functional analysis of the caenorhabditis elegans genome using rnai, Nature, vol. 421, no. 6920, pp. 231-237, 2003.

Rank No.	Feature name	Rank No.	Feature name
1	ION	14	Endoplasmic reticulum
2	WDC	15	BC
3	Nucleus	16	Mitochondrion
4	PeC	17	Membrane
5	DC	18	Transmembrane
6	NC	19	Secretory pathway
7	Cytoplasm	20	Cell wall
8	IC	21	Cytoskeleton
9	Vacuole	22	CC
10	EC	23	Vesicles
11	Endosome	24	Golgi
12	SC	25	Extracellular
13	Peroxisome	26	Lysosome

Number of features	Features name	AUC of ROC
4	ION,WDC,Nucleus,PeC	0.608
8	ION, WDC, Nucleus, PeC, DC, NC, Cytoplasm, IC	0.609
16	ION, WDC, Nucleus, PeC, DC, NC, Cytoplasm, IC, Vacuole, EC, Endosome, SC, Peroxisome, Endoplasmic reticulum, BC, Mitochondrion	0.607
26	ALL features	0.577

	ION	WDC	Nucleus	PeC	DC	NC	Cytoplasm	IC
ION	1
WDC	0.425 232	1
Nucleus	0.240 546	0.195 593	1
PeC	0.349 71	0.801 061	0.156 85	1
DC	0.388 783	0.579 09	0.119 282	0.348 574	1
NC	0.440 695	0.810 345	0.209 794	0.555 108	0.724 148	1
Cytoplasm	0.160 436	0.052 004	0.202 167	0.034 506	0.069 622	0.063 784	1
IC	0.581 611	0.464 07	0.229 772	0.293 526	0.616 673	0.544 119	0.092 517	1

Method	Number of features	TP rate	FP rate	Precision	Recall	F-Measure	MCC	ROC area	PRC area
SVM	6	0.805	0.586	0.791	0.805	0.767	0.341	0.61	0.706
	8	0.805	0.587	0.791	0.805	0.766	0.34	0.609	0.706
	All	0.801	0.646	0.808	0.801	0.745	0.313	0.577	0.691
Naive Bayes	6	0.79	0.477	0.773	0.79	0.778	0.352	0.748	0.795
	8	0.79	0.52	0.768	0.79	0.771	0.328	0.745	0.796
	ALL	0.782	0.526	0.758	0.782	0.764	0.304	0.744	0.792
Bayes Network	6	0.76	0.405	0.769	0.76	0.764	0.344	0.75	0.812
	8	0.755	0.397	0.768	0.755	0.76	0.341	0.747	0.809
	ALL	0.71	0.394	0.75	0.71	0.725	0.285	0.73	0.797
NBTree	6	0.811	0.509	0.793	0.811	0.789	0.387	0.755	0.81
	8	0.806	0.533	0.787	0.806	0.78	0.364	0.755	0.806
	ALL	0.806	0.534	0.787	0.806	0.78	0.363	0.755	0.809