Feature Selection with Graph Mining Technology

Thosini Bamunu Mudiyanselage; Yanqing Zhang

doi:10.26599/BDMA.2018.9020032

Big Data Mining and Analytics 2019, 2(2): 73-82 https://doi.org/10.26599/BDMA.2018.9020032

Open Access | Issue | Published: 14 May 2019

Feature Selection with Graph Mining Technology

Show Author's Information Hide Author's Information Thosini Bamunu Mudiyanselage(

), Yanqing Zhang

∙ Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA.

Keywords:

feature selection, graph mining, network embedding, big data analysis, high-dimensional data

Cite this article:

Mudiyanselage TB, Zhang Y. Feature Selection with Graph Mining Technology. Big Data Mining and Analytics, 2019, 2(2): 73-82. https://doi.org/10.26599/BDMA.2018.9020032

Download citation

EndNote(RIS)

BibTeX

635

Views

Downloads

Citations

Crossref

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

Many real world applications have problems with high dimensionality, which existing algorithms cannot overcome. A critical data preprocessing problem is feature selection, whereby its non-scalability negatively influences both the efficiency and performance of big data applications. In this research, we developed a new algorithm to reduce the dimensionality of a problem using graph-based analysis, which retains the physical meaning of the original high-dimensional feature space. Most existing feature-selection methods are based on a strong assumption that features are independent of each other. However, if the feature-selection algorithm does not take into consideration the interdependencies of the feature space, the selected data fail to correctly represent the original data. We developed a new feature-selection method to address this challenge. Our aim in this research was to examine the dependencies between features and select the optimal feature set with respect to the original data structure. Another important factor in our proposed method is that it can perform even in the absence of class labels. This is a more difficult problem that many feature-selection algorithms fail to address. In this case, they only use wrapper techniques that require a learning algorithm to select features. It is important to note that our experimental results indicates, this proposed simple ranking method performs better than other methods, independent of any particular learning algorithm used.

Full text

Abstract

Full text

Outline

About this article

Feature Selection with Graph Mining Technology

Show Author's information Hide Author's Information Thosini Bamunu Mudiyanselage(

), Yanqing Zhang

∙ Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA.

Abstract

Keywords: feature selection, graph mining, network embedding, big data analysis, high-dimensional data

References(27)

[1]

C. Fellbaum, WordNet. Wiley Online Library, 1998.

DOI

[2]

Y. Wang, Y. Yao, H. Tong, F. Xu, and J. Lu, A brief review of network embedding, Big Data Mining and Analytics, vol. 2, no. 1, pp. 35-47, 2019.

DOI Google Scholar

[3]

C. Lazar, J. Taminau, S. Meganck, D. Steenhoff, A. Coletta, C. Molter, V. de Schaetzen, R. Duque, H. Bersini, and A. Nowe, A survey on filter techniques for feature selection in gene expression microarray analysis, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 4, pp. 1106-1119, 2012.

DOI Google Scholar

[4]

A. Sharma, S. Imoto, and S. Miyano, A top-r feature selection algorithm for microarray gene expression data, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 3, pp. 754-764, 2012.

DOI Google Scholar

[5]

L. Cervante, B. Xue, M. Zhang, and L. Shang, Binary particle swarm optimization for feature selection: A filter based approach, in IEEE Congress on Evolutionary Computation, Brisbane, Australia, 2012, pp. 1-8.

DOI

[6]

M. Pedergnana, P. R. Marpu, M. D. Mura, J. A. Benediktsson, and L. Bruzzone, A novel technique for optimal feature selection in attribute profiles based on genetic algorithms, IEEE Transactions on Geoscience and Remote Sensing, vol. 51, no. 6, pp. 3514-3528, 2013.

DOI Google Scholar

[7]

T. Basu and C. A. Murthy, Effective text classification by a supervised feature selection approach, in IEEE 12th International Conference on Data Mining Workshops, Brussels, Belgium, 2012, pp. 918-925.

DOI

[8]

Z. Zhao, L. Wang, H. Liu, and J. Ye, On similarity preserving feature selection, IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 3, pp. 619-632, 2013.

DOI Google Scholar

[9]

J. Liang, F. Wang, C. Dang, and Y. Qian, A group incremental approach to feature selection applying rough set technique, IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 2, pp. 294-308, 2014.

DOI Google Scholar

[10]

M. Yaqub, M. K. Javaid, C. Cooper, and J. A. Noble, Investigation of the role of feature selection and weighted voting in random forests for 3-D volumetric segmentation, IEEE Transactions on Medical Imaging, vol. 33, no. 2, pp. 258-271, 2014.

DOI Google Scholar

[11]

Q. Song, J. Ni, and G. Wang, A fast clustering-based feature subset selection algorithm for high-dimensional data, IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 1, pp. 1-14, 2013.

DOI Google Scholar

[12]

Z. Zhang and E. R. Hancock, A graph-based approach to feature selection, in Graph-Based Representations in Pattern Recognition. Springer, 2011, pp. 205-214.

DOI

[13]

L. M. Abualigah and A. T. Khader, Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering, The Journal of Supercomputing, vol. 73, no. 11, pp. 4773-4795, 2017.

DOI Google Scholar

[14]

D. Cai, C. Zhang, and X. He, Unsupervised feature selection for multi-cluster data, in 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 2010, pp. 333-342.

DOI

[15]

W. Zheng, X. Zhu, G. Wen, Y. Zhu, H. Yu, and J. Gan, Unsupervised feature selection by self-paced learning regularization, Pattern Recognition Letters, https://doi.org/10.1016/j.patrec.2018.06.029.

[16]

C. Lei and X. Zhu, Unsupervised feature selection via local structure learning and sparse learning, Multimedia Tools and Applications, vol. 77, no. 22, pp. 29605-29622, 2018.

DOI Google Scholar

[17]

R. Hu, X. Zhu, D. Cheng, W. He, Y. Yan, J. Song, and S. Zhang, Graph self-representation method for unsupervised feature selection, Neurocomputing, vol. 220, pp. 130-137, 2017.

DOI Google Scholar

[18]

X. Zhu, X. Li, S. Zhang, C. Ju, and X. Wu, Robust joint graph sparse coding for unsupervised spectral feature selection, IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 6, pp. 1263-1275, 2017.

DOI Google Scholar

[19]

X. Zhu, S. Zhang, Y. Li, J. Zhang, L. Yang, and Y. Fang, Low-rank sparse subspace for spectral clustering, IEEE Transactions on Knowledge and Data Engineering, .

DOI Google Scholar

[20]

X. Zhu, S. Zhang, R. Hu, Y. Zhu, and J. Song, Local and global structure preservation for robust unsupervised spectral feature selection, IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 3, pp. 517-529, 2018.

DOI Google Scholar

[21]

M. Luo, F. Nie, X. Chang, Y. Yang, A. G. Hauptmann, and Q. Zheng, Adaptive unsupervised feature selection with structure regularization, IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 4, pp. 944-956, 2018.

DOI Google Scholar

[22]

P. Zhu, W. Zhu, Q. Hu, C. Zhang, and W. Zuo, Subspace clustering guided unsupervised feature selection, Pattern Recognition, vol. 66, pp. 364-374, 2017.

DOI Google Scholar

[23]

Y. Zheng, B. Jeon, L. Sun, J. Zhang, and H. Zhang, Studentâs t-Hidden Markov model for unsupervised learning using localized feature selection, IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 10, pp. 2586-2598, 2018.

DOI Google Scholar

[24]

X. He, D. Cai, and P. Niyogi, Laplacian score for feature selection, in Advances in Neural Information Processing Systems 18. Cambridge, MA, USA: MIT Press, 2005.

[25]

J. Li, K. Cheng, S. Wang, F. Morstatter, R. P. Trevino, J. Tang, and H. Liu, Feature selection: A data perspective, arXiv preprint arXiv:1601.07996, 2016.

Google Scholar

[26]

Q. Gu, Z. Li, and J. Han, Generalized fisher score for feature selection, in Proc. 27th Uncertainty in Artificial Intelligence Conf., Barcelona, Spain, 2011, pp. 266-273.

[27]

M. Robnik-Sikonja and I. Kononenko, Theoretical and empirical analysis of ReliefF and RReliefF, Machine Learning, vol. 53, nos. 1&2, pp. 23-69, 2003.

DOI Google Scholar

About this article

Publication history

Rights and permissions

Publication history

Received: 20 June 2018

Accepted: 02 August 2018

Published: 14 May 2019

Issue date: June 2019

Feature Selection with Graph Mining Technology

Feature Selection with Graph Mining Technology

Abstract

References(27)

Publication history

Copyright

Rights and permissions