| Sign up

PDF (1 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Open Access

Feature Selection with Graph Mining Technology

Thosini Bamunu Mudiyanselage(), Yanqing Zhang

∙ Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA.

Show Author Information

Abstract

Many real world applications have problems with high dimensionality, which existing algorithms cannot overcome. A critical data preprocessing problem is feature selection, whereby its non-scalability negatively influences both the efficiency and performance of big data applications. In this research, we developed a new algorithm to reduce the dimensionality of a problem using graph-based analysis, which retains the physical meaning of the original high-dimensional feature space. Most existing feature-selection methods are based on a strong assumption that features are independent of each other. However, if the feature-selection algorithm does not take into consideration the interdependencies of the feature space, the selected data fail to correctly represent the original data. We developed a new feature-selection method to address this challenge. Our aim in this research was to examine the dependencies between features and select the optimal feature set with respect to the original data structure. Another important factor in our proposed method is that it can perform even in the absence of class labels. This is a more difficult problem that many feature-selection algorithms fail to address. In this case, they only use wrapper techniques that require a learning algorithm to select features. It is important to note that our experimental results indicates, this proposed simple ranking method performs better than other methods, independent of any particular learning algorithm used.

Keywords

graph mining network embedding big data analysis feature selection high-dimensional data

References

[1]

C.

Fellbaum

, WordNet. Wiley Online Library, 1998.

[2]

Y.

Wang

, Y.

Yao

, H.

Tong

, F.

Xu

, and J.

Lu

, A brief review of network embedding, Big Data Mining and Analytics, vol. 2, no. 1, pp. 35-47, 2019.

Crossref Google Scholar

[3]

C.

Lazar

, J.

Taminau

, S.

Meganck

, D.

Steenhoff

, A.

Coletta

, C.

Molter

, V.

de Schaetzen

, R.

Duque

, H.

Bersini

, and A.

Nowe

, A survey on filter techniques for feature selection in gene expression microarray analysis, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 4, pp. 1106-1119, 2012.

Crossref Google Scholar

[4]

A.

Sharma

, S.

Imoto

, and S.

Miyano

, A top-r feature selection algorithm for microarray gene expression data, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 3, pp. 754-764, 2012.

Crossref Google Scholar

[5]

L.

Cervante

, B.

Xue

, M.

Zhang

, and L.

Shang

, Binary particle swarm optimization for feature selection: A filter based approach, in IEEE Congress on Evolutionary Computation, Brisbane, Australia, 2012, pp. 1-8.

[6]

M.

Pedergnana

, P. R.

Marpu

, M. D.

Mura

, J. A.

Benediktsson

, and L.

Bruzzone

, A novel technique for optimal feature selection in attribute profiles based on genetic algorithms, IEEE Transactions on Geoscience and Remote Sensing, vol. 51, no. 6, pp. 3514-3528, 2013.

Crossref Google Scholar

[7]

T.

Basu

and C. A.

Murthy

, Effective text classification by a supervised feature selection approach, in IEEE 12th International Conference on Data Mining Workshops, Brussels, Belgium, 2012, pp. 918-925.

[8]

Z.

Zhao

, L.

Wang

, H.

Liu

, and J.

Ye

, On similarity preserving feature selection, IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 3, pp. 619-632, 2013.

Crossref Google Scholar

[9]

J.

Liang

, F.

Wang

, C.

Dang

, and Y.

Qian

, A group incremental approach to feature selection applying rough set technique, IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 2, pp. 294-308, 2014.

Crossref Google Scholar

[10]

M.

Yaqub

, M. K.

Javaid

, C.

Cooper

, and J. A.

Noble

, Investigation of the role of feature selection and weighted voting in random forests for 3-D volumetric segmentation, IEEE Transactions on Medical Imaging, vol. 33, no. 2, pp. 258-271, 2014.

Crossref Google Scholar

[11]

Q.

Song

, J.

Ni

, and G.

Wang

, A fast clustering-based feature subset selection algorithm for high-dimensional data, IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 1, pp. 1-14, 2013.

Crossref Google Scholar

[12]

Z.

Zhang

and E. R.

Hancock

, A graph-based approach to feature selection, in Graph-Based Representations in Pattern Recognition. Springer, 2011, pp. 205-214.

[13]

L. M.

Abualigah

and A. T.

Khader

, Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering, The Journal of Supercomputing, vol. 73, no. 11, pp. 4773-4795, 2017.

Crossref Google Scholar

[14]

D.

Cai

, C.

Zhang

, and X.

He

, Unsupervised feature selection for multi-cluster data, in 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 2010, pp. 333-342.

[15]

W.

Zheng

, X.

Zhu

, G.

Wen

, Y.

Zhu

, H.

Yu

, and J.

Gan

, Unsupervised feature selection by self-paced learning regularization, Pattern Recognition Letters, https://doi.org/10.1016/j.patrec.2018.06.029.

[16]

C.

Lei

and X.

Zhu

, Unsupervised feature selection via local structure learning and sparse learning, Multimedia Tools and Applications, vol. 77, no. 22, pp. 29605-29622, 2018.

Crossref Google Scholar

[17]

R.

Hu

, X.

Zhu

, D.

Cheng

, W.

He

, Y.

Yan

, J.

Song

, and S.

Zhang

, Graph self-representation method for unsupervised feature selection, Neurocomputing, vol. 220, pp. 130-137, 2017.

Crossref Google Scholar

[18]

X.

Zhu

, X.

Li

, S.

Zhang

, C.

Ju

, and X.

Wu

, Robust joint graph sparse coding for unsupervised spectral feature selection, IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 6, pp. 1263-1275, 2017.

Crossref Google Scholar

[19]

X.

Zhu

, S.

Zhang

, Y.

Li

, J.

Zhang

, L.

Yang

, and Y.

Fang

, Low-rank sparse subspace for spectral clustering, IEEE Transactions on Knowledge and Data Engineering, .

Crossref Google Scholar

[20]

X.

Zhu

, S.

Zhang

, R.

Hu

, Y.

Zhu

, and J.

Song

, Local and global structure preservation for robust unsupervised spectral feature selection, IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 3, pp. 517-529, 2018.

Crossref Google Scholar

[21]

M.

Luo

, F.

Nie

, X.

Chang

, Y.

Yang

, A. G.

Hauptmann

, and Q.

Zheng

, Adaptive unsupervised feature selection with structure regularization, IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 4, pp. 944-956, 2018.

Crossref Google Scholar

[22]

P.

Zhu

, W.

Zhu

, Q.

Hu

, C.

Zhang

, and W.

Zuo

, Subspace clustering guided unsupervised feature selection, Pattern Recognition, vol. 66, pp. 364-374, 2017.

Crossref Google Scholar

[23]

Y.

Zheng

, B.

Jeon

, L.

Sun

, J.

Zhang

, and H.

Zhang

, Studentâs t-Hidden Markov model for unsupervised learning using localized feature selection, IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 10, pp. 2586-2598, 2018.

Crossref Google Scholar

[24]

X.

He

, D.

Cai

, and P.

Niyogi

, Laplacian score for feature selection, in Advances in Neural Information Processing Systems 18. Cambridge, MA, USA: MIT Press, 2005.

[25]

J.

Li

, K.

Cheng

, S.

Wang

, F.

Morstatter

, R. P.

Trevino

, J.

Tang

, and H.

Liu

, Feature selection: A data perspective, arXiv preprint arXiv:1601.07996, 2016.

[26]

Q.

Gu

, Z.

Li

, and J.

Han

, Generalized fisher score for feature selection, in Proc. 27th Uncertainty in Artificial Intelligence Conf., Barcelona, Spain, 2011, pp. 266-273.

[27]

M.

Robnik-Sikonja

and I.

Kononenko

, Theoretical and empirical analysis of ReliefF and RReliefF, Machine Learning, vol. 53, nos. 1&2, pp. 23-69, 2003.

Crossref Google Scholar

Big Data Mining and Analytics

Volume 2 Issue 2,
June 2019

Pages 73-82

DOI: 10.26599/BDMA.2018.9020032

Cite this article:

Mudiyanselage TB, Zhang Y. Feature Selection with Graph Mining Technology. Big Data Mining and Analytics, 2019, 2(2): 73-82. https://doi.org/10.26599/BDMA.2018.9020032

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号