Journal Home > Volume 3 , Issue 3

Feature selection is a crucial problem in efficient machine learning, and it also greatly contributes to the explainability of machine-driven decisions. Methods, like decision trees and Least Absolute Shrinkage and Selection Operator (LASSO), can select features during training. However, these embedded approaches can only be applied to a small subset of machine learning models. Wrapper based methods can select features independently from machine learning models but they often suffer from a high computational cost. To enhance their efficiency, many randomized algorithms have been designed. In this paper, we propose automatic breadth searching and attention searching adjustment approaches to further speedup randomized wrapper based feature selection. We conduct theoretical computational complexity analysis and further explain our algorithms’ generic parallelizability. We conduct experiments on both synthetic and real datasets with different machine learning base models. Results show that, compared with existing approaches, our proposed techniques can locate a more meaningful set of features with a high efficiency.


menu
Abstract
Full text
Outline
About this article

Novel and Efficient Randomized Algorithms for Feature Selection

Show Author's information Zigeng WangXia XiaoSanguthevar Rajasekaran( )
Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA.

Abstract

Feature selection is a crucial problem in efficient machine learning, and it also greatly contributes to the explainability of machine-driven decisions. Methods, like decision trees and Least Absolute Shrinkage and Selection Operator (LASSO), can select features during training. However, these embedded approaches can only be applied to a small subset of machine learning models. Wrapper based methods can select features independently from machine learning models but they often suffer from a high computational cost. To enhance their efficiency, many randomized algorithms have been designed. In this paper, we propose automatic breadth searching and attention searching adjustment approaches to further speedup randomized wrapper based feature selection. We conduct theoretical computational complexity analysis and further explain our algorithms’ generic parallelizability. We conduct experiments on both synthetic and real datasets with different machine learning base models. Results show that, compared with existing approaches, our proposed techniques can locate a more meaningful set of features with a high efficiency.

Keywords: feature selection, randomized algorithms, efficient selection

References(51)

[1]
Z. G. Wang and S. Rajasekaran, Efficient randomized feature selection algorithms, in Proc. 2019 IEEE 21st Int. Conf. High Performance Computing and Communications; IEEE 17th Int. Conf. Smart City; IEEE 5th Int. Conf. on Data Science and Systems (HPCC/SmartCity/DSS), Zhangjiajie, China, 2019, pp. 796-803.
[2]
R. Ounit, S. Wanamaker, T. J. Close, and S. Lonardi, CLARK: Fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers, BMC Genomics, vol. 16. no.1, p. 236, 2015.
[3]
P. Menzel, K. L. Ng, and A. Krogh, Fast and sensitive taxonomic classification for metagenomics with Kaiju, Nat. Commun., vol. 7, p. 11 257, 2016.
[4]
Y. X. Tong, Y. Q. Chen, Z. M. Zhou, L. Chen, J. Wang, Q. Yang, L. P. Ye and W. F. Lv, The simpler the better: A unified approach to predicting original taxi demands based on large-scale online platforms, in Proc. 23rd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, New York, NY, USA, 2017, pp. 1653-1662.
DOI
[5]
C. F. Zhang, M. X. Dong, T. H. Luan, and K. Ota, Battery maintenance of pedelec sharing system: Big data based usage prediction and replenishment scheduling, IEEE Trans. Network Sci. Eng., vol. 7, no. 1, pp. 127-138, 2019.
[6]
H. Moeini, W. X. Zeng, I. L. Yen, and F. Bastani, Toward data discovery in dynamic Smart city applications, in Proc. 2019 IEEE 21st Int. Conf. High Performance Computing and Communications; IEEE 17th Int. Conf. Smart City; IEEE 5th Int. Conf. Data Science and Systems (HPCC/SmartCity/DSS), Zhangjiajie, China, 2019, pp. 2572-2579.
[7]
X. R. Zhang, C. A. Huang, M. Y. Liu, A. Stefanopoulou, and T. Ersal, Predictive cruise control with private vehicle-to-vehicle communication for improving fuel consumption and emissions, IEEE Commun. Mag., vol. 57, no. 10, pp. 91-97, 2019.
[8]
A. Gledson, T. B. Dhafari, N. Paton, and J. Keane, A smart city dashboard for combining and analysing multi-source data streams, in Proc. 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), Exeter, UK, 2018, pp. 1366-1373.
DOI
[9]
M. Mohammadi and A. Al-Fuqaha, Enabling cognitive smart cities using big data and machine learning: Approaches and challenges, IEEE Commun. Magaz., vol. 56, no. 2, pp. 94-101, 2018.
[10]
S. Wold, K. Esbensen, and P. Geladi, Principal component analysis, Chemometr. Intellig. Lab. Syst., vol. 2, nos. 1-3, pp. 37-52, 1987.
[11]
S. Mika, G. Ratsch, J. Weston, B. Scholkopf, and K. R. Mullers, Fisher discriminant analysis with kernels, in Proc. Neural Networks for Signal Processing IX: Proc. 1999 IEEE Signal Processing Society Workshop, Madison, WI, USA, 1999, pp. 41-48.
[12]
L. P. Wang, Y. L. Wang, and Q. Chang, Feature selection methods for big data bioinformatics: A survey from the search perspective, Methods, vol. 111, pp. 21-31, 2016.
[13]
D. Mladenić, Feature selection in text mining, in Encyclopedia of Machine Learning, C. Sammut and G. J. Webb, eds. Boston, MA, USA: Springer, 2011, pp. 406-410.
DOI
[14]
K. Huang and S. Aviyente, Wavelet feature selection for image classification, IEEE Trans. Image Process., vol. 17, no. 9, pp. 1709-1720, 2008.
[15]
J. L. Tracy, A. Trabucco, A. M. Lawing, J. T. Giermakowski, M. Tchakerian, G. M. Drus, and R. D. Coulson, Random subset feature selection for ecological niche models of wildfire activity in Western North America, Ecol. Modell., vol. 383, pp. 52-68, 2018.
[16]
C. Liu, D. X. Jiang, and W. G. Yang, Global geometric similarity scheme for feature selection in fault diagnosis, Exp. Syst. Applicat., vol. 41, no. 8, pp. 3585-3595, 2014.
[17]
J. D. Li, K. W. Cheng, S. H. Wang, F. Morstatter, R. P. Trevino and J. L. Tang, Feature selection: A data perspective, ACM Comput. Sur., vol. 50, no. 6, p. 94, 2017.
[18]
A. Jović, K. Brkić, and N. Bogunović, A review of feature selection methods with applications, in Proc. 2015 38th Int. Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 2015, pp. 1200-1205.
[19]
Z. H. Zhang and E. R. Hancock, A graph-based approach to feature selection, in Proc. Int. Workshop on Graph-Based Representations in Pattern Recognition, Münster, Germany, 2011, pp. 205-214.
DOI
[20]
A. K. Das, S. Goswami, A. Chakrabarti, and B. Chakraborty, A new hybrid feature selection approach using feature association map for supervised and unsupervised classification, Exp. Syst. Appl., vol. 88, pp. 81-94, 2017.
[21]
P. Xiao, Z. G. Wang, and S. Rajasekaran, Novel speedup techniques for parallel singular value decomposition, in Proc. 2018 IEEE 20th Int. Conf. High Performance Computing and Communications; IEEE 16th Int. Conf. Smart City; IEEE 4th Int. Conf. Data Science and Systems (HPCC/SmartCity/DSS), Exeter, UK, 2018, pp. 188-195.
[22]
G. Chandrashekar and F. Sahin, A survey on feature selection methods, Comput. Electr. Eng., vol. 40, no. 1, pp. 16-28, 2014.
[23]
V. Fonti and E. Belitser, Feature Selection Using Lasso, https://pdfs.semanticscholar.org/24ac/d159910658223209 433cf4cbe3414264de39.pdf, 2017.
[24]
J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra, Efficient projections onto the l1-ball for learning in high dimensions, in Proc. 25th Int. Conf. Machine Learning, Helsinki, Finland, 2008, pp. 272-279.
DOI
[25]
N. Grgić-Hlača, M. B. Zafar, K. P. Gummadi, and A. Weller, Beyond distributive fairness in algorithmic decision making: Feature selection for procedurally fair learning, in Proc. Thirty-Second AAAI Conf. on Artificial Intelligence (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symp. on Educational Advances in Artificial Intelligence, New Orleans, LA, USA, 2018.
[26]
X. Zhang, M. M. Khalili, C. Tekin and M. Liu, Group retention when using machine learning in sequential decision making: The interplay between user dynamics and fairness, presented at the 33rd Conf. Neural Information Processing Systems (NeurIPS), Vancouver, Canada, 2019, pp. 15 243-15 252.
[27]
T. Zhang, T. Q. Zhu, P. Xiong, H. Huo, Z. Tari, and W. L. Zhou, Correlated differential privacy: Feature selection in machine learning, IEEE Trans. Ind. Inf., vol. 16, no. 3, pp. 2115-2124, 2019.
[28]
X. R. Zhang, M. M. Khalili, and M. Y. Liu, Improving the privacy and accuracy of ADMM-based distributed algorithms, in Proc. 35th Int. Conf. on Machine Learning, Stockholm, Sweden, 2018, pp. 5796-5805.
[29]
C. Li, D. R. De Celis Leal, S. Rana, S. Gupta, A. Sutti, S. Greenhill, T. Slezak, M. Height, and S. Venkatesh, Rapid Bayesian optimisation for synthesis of short polymer fiber materials, Sci. Rep., vol. 7, p. 5683, 2017.
[30]
C. Kim, A. Chandrasekaran, T. D. Huan, D. Das, and R. Ramprasad, Polymer genome: A data-powered polymer informatics platform for property predictions, J. Phys. Chem. C, vol. 122, no. 31, pp. 17 575-17 585, 2018.
[31]
P. M. Narendra and K. Fukunaga, A branch and bound algorithm for feature subset selection, IEEE Trans. Comput., vol. C-26, no. 9, pp. 917-922, 1977.
[32]
J. Doak, An evaluation of feature selection methods and their application to computer security, Tech. Rep. CSE-92-18 UC Davis, Department of Computer Science, University of California, Davis, CA, USA, 1992.
[33]
S. Nakariyakul and D. P. Casasent, An improvement on floating search algorithms for feature subset selection, Patt. Recognit., vol. 42, no. 9, pp. 1932-1940, 2009.
[34]
S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach. New York, NY, USA: Prentice Hall, 2009.
[35]
B. Xue, M. J. Zhang, W. N. Browne, and X. Yao, A survey on evolutionary computation approaches to feature selection, IEEE Trans. Evol. Comput., vol. 20, no. 4, pp. 606-626, 2016.
[36]
P. Ghamisi and J. A. Benediktsson, Feature selection based on hybridization of genetic algorithm and particle swarm optimization, IEEE Geosci. Remote Sens. Lett., vol. 12, no. 2, pp. 309-313, 2015.
[37]
D. J. Stracuzzi and P. E. Utgoff, Randomized variable elimination, J. Mach. Learn. Res., vol. 5, pp. 1331-1362, 2004.
[38]
S. Saha, S. Rajasekaran, and R. Ramprasad, Novel randomized feature selection algorithms, Int. J. Found. Comput. Sci., vol. 26, no. 3, pp. 321-341, 2015.
[39]
A. Brankovic, A. Falsone, M. Prandini, and L. Piroddi, A feature selection and classification algorithm based on randomized extraction of model populations, IEEE Trans. Cybernet., vol. 48, no. 4, pp. 1151-1162, 2018.
[40]
I. Foroutan and J. Sklansky, Feature selection for automatic classification of non-Gaussian data, IEEE Trans. Syst. Man Cybernet., vol. 17, no. 2, pp. 187-198, 1987.
[41]
P. Pudil, J. Novovičová, and J. Kittler, Floating search methods in feature selection, Patt. Recognit. Lett., vol. 15, no. 11, pp. 1119-1125, 1994.
[42]
D. J. Stracuzzi, Randomized feature selection, in Computational Methods of Feature Selection, H. Liu and H. Motoda, eds. London, UK: Chapman and Hall/CRC, 2007, pp. 53-74.
DOI
[43]
S. W. Lin, Z. J. Lee, S. C. Chen, and T. Y. Tseng, Parameter determination of support vector machine and feature selection using simulated annealing approach, Appl. Soft Comput., vol. 8, no. 4, pp. 1505-1512, 2008.
[44]
S. Rajasekaran and J. H. Reif, Derivation of randomized sorting and selection algorithms, in Parallel Algorithm Derivation and Program Transformation, R. Paige, J. Reif, and R. Watcher, eds. Boston, MA, USA: Springer, 1993, pp. 187-205.
[45]
S. Rajasekaran, On simulated annealing and nested annealing, J. Glob. Optim., vol. 16, no. 1, pp. 43-56, 2000.
[46]
J. Jájá, An Introduction to Parallel Algorithms. Redwood City, CA, USA: Addison Wesley Longman Publishing Co., Inc., 1992.
[47]
E. Horowitz, S. Sahni, and S. Rajasekaran, Computer Algorithms C++: C++ and Pseudocode Versions. Stuttgart, Germany: Macmillan, 1997.
[48]
I. Guyon, S. Gunn, A. B. Hur, and G. Dror, Design and analysis of the NIPS2003 challenge, in Feature Extraction, I. Guyon, M. Nikravesh, S. Gunn, and L. A. Zadeh, eds. Berlin, Germany: Springer, 2006, pp. 237-263.
DOI
[49]
T. F. Investment and Kaggle, Restaurant revenue prediction: Predict annual restaurant sales based on objective measurements, https://www.kaggle.com/c/restaurant-revenue-prediction, 2019.
[50]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in Python, J. Mach. Learn. Res., vol. 12, pp. 2825-2830, 2011.
[51]
A. Lasek, N. Cercone, and J. Saunders, Smart Restaurants: Survey on customer demand and sales forecasting, in Smart Cities and Homes, M. S. Obaidat and O. Nicopolitidis, eds. Amsterdam, Netherlands: Elsevier, 2016, pp. 361-386.
DOI
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 15 April 2020
Revised: 06 May 2020
Accepted: 07 May 2020
Published: 16 July 2020
Issue date: September 2020

Copyright

© The author(s) 2020

Acknowledgements

This work was supported in part by the National Science Foundation (NSF) (Nos. 1447711, 1743418, and 1843025).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return