GAAF: Searching Activation Functions for Binary Neural Networks Through Genetic Algorithm

Yanfei Li; Tong Geng; Samuel Stein; Ang Li; Huimin Yu

doi:10.26599/TST.2021.9010084

Tsinghua Science and Technology 2023, 28(1): 207-220 https://doi.org/10.26599/TST.2021.9010084

Open Access | Issue | Published: 21 July 2022

GAAF: Searching Activation Functions for Binary Neural Networks Through Genetic Algorithm

Show Author's Information Hide Author's Information Yanfei Li^¹, Tong Geng^², Samuel Stein^², Ang Li^², Huimin Yu^¹(

)

1 Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China

2 Pacific Northwest National Laboratory, Richland, WA 99354, USA

Keywords:

genetic algorithm, binary neural networks (BNNs), activation function

Cite this article:

Li Y, Geng T, Stein S, et al. GAAF: Searching Activation Functions for Binary Neural Networks Through Genetic Algorithm. Tsinghua Science and Technology, 2023, 28(1): 207-220. https://doi.org/10.26599/TST.2021.9010084

Download citation

EndNote(RIS)

BibTeX

505

Views

Downloads

Citations

Crossref

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

Binary neural networks (BNNs) show promising utilization in cost and power-restricted domains such as edge devices and mobile systems. This is due to its significantly less computation and storage demand, but at the cost of degraded performance. To close the accuracy gap, in this paper we propose to add a complementary activation function (AF) ahead of the sign based binarization, and rely on the genetic algorithm (GA) to automatically search for the ideal AFs. These AFs can help extract extra information from the input data in the forward pass, while allowing improved gradient approximation in the backward pass. Fifteen novel AFs are identified through our GA-based search, while most of them show improved performance (up to 2.54% on ImageNet) when testing on different datasets and network models. Interestingly, periodic functions are identified as a key component for most of the discovered AFs, which rarely exist in human designed AFs. Our method offers a novel approach for designing general and application-specific BNN architecture. GAAF will be released on GitHub.

Full text

Abstract

Full text

Outline

About this article

GAAF: Searching Activation Functions for Binary Neural Networks Through Genetic Algorithm

Show Author's information Hide Author's Information Yanfei Li^¹, Tong Geng^², Samuel Stein^², Ang Li^², Huimin Yu^¹(

)

1 Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China

2 Pacific Northwest National Laboratory, Richland, WA 99354, USA

Abstract

Keywords: genetic algorithm, binary neural networks (BNNs), activation function

References(52)

[1]

M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio, Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or –1, arXiv preprint arXiv: 1602.02830, 2016.

Google Scholar

[2]

I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, Binarized neural networks, in Proc. 30^th International Conference on Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 4114–4122.

Google Scholar

[3]

Fasfous

, Vemparala

M. -R.

, Frickenstein

, and Stechele

, BinaryCop: Binary neural network-based COVID-19 face-mask wear and positioning predictor on edge devices, arXiv preprint arXiv: 2102.03456, 2021.

Google Scholar

[4]

G. Chen, H. Meng, Y. Liang, and K. Huang, GPU-accelerated real-time stereo estimation with binary neural network, IEEE Transactions on Parallel and Distributed Systems, vol. 31, no. 12, pp. 2896–2907, 2020.

DOI Google Scholar

[5]

C. -H. Huang, An FPGA-based hardware/software design using binarized neural networks for agricultural applications: A case study, IEEE Access, vol. 9, pp. 26523–26531.

DOI Google Scholar

[6]

Y. Ma, H. Xiong, Z. Hu, and L. Ma, Efficient super resolution using binarized neural network, in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 2019, pp. 694–703, 2019.

DOI Google Scholar

[7]

C. Ma, Y. Guo, Y. Lei, and W. An, Binary volumetric convolutional neural networks for 3-D object recognition, IEEE Transactions on Instrumentation and Measurement, vol. 68, no. 1, pp. 38–48, 2018.

DOI Google Scholar

[8]

A. Li, T. Geng, T. Wang, M. Herbordt, S. L. Song, and K. Barker, BSTC: A novel binarized-soft-tensor-core design for accelerating bitbased approximated neural nets, in Proc. International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA, 2019, pp. 1–30.

DOI Google Scholar

[9]

A. Galloway, G. W. Taylor, and M. Moussa, Attacking binarized neural networks, arXiv preprint arXiv:1711.00449, 2018.

Google Scholar

[10]

Y. Hu, J. Zhai, D. Li, Y. Gong, Y. Zhu, W. Liu, L. Su, and J. Jin, BitFlow: Exploiting vector parallelism for binary neural networks on CPU, in Proc. 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vancouver, Canada, 2018, pp. 244–253.

DOI Google Scholar

[11]

A. Li and S. Su, Accelerating binarized neural networks via bit-tensor-cores in turing GPUs, IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 7, pp. 1878–1891, 2020.

DOI Google Scholar

[12]

T. Geng, T. Wang, C. Wu, C. Yang, W. Wu, A. Li, and M. C. Herbordt, O3BNN: An out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning, in Proc. ACM International Conference on Supercomputing, Phoenix, AZ, USA, 2019, pp. 461–472.

DOI Google Scholar

[13]

T. Geng, T. Wang, C. Wu, C. Yang, S. L. Song, A. Li, and M. Herbordt, LP-BNN: Ultra-low-latency BNN inference with layer parallelism, in Proc. 2019 IEEE 30^th International Conference on Application-specific Systems, Architectures and Processors (ASAP), New York, NY, USA, 2019, pp. 9–16.

DOI Google Scholar

[14]

T. Geng, A. Li, T. Wang, C. Wu, Y. Li, R. Shi, W. Wu, and M. Herbordt, O3BNN-R: An out-of-order architecture for high-performance and regularized BNN inference, IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 1, pp. 199–213, 2020.

DOI Google Scholar

[15]

A. G. Anderson and C. P. Berg, The high-dimensional geometry of binary neural networks, arXiv preprint arXiv: 1705.07199, 2017.

Google Scholar

[16]

J. Bethge, C. Bartz, H. Yang, Y. Chen, and C. Meinel, MeliusNet: An improved network architecture for binary neural networks, in Proc. 2021 IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2021, pp. 1438–1447.

DOI Google Scholar

[17]

Y. Bengio, N. Leonard, and A. Courville, Estimating or propagating gradients through stochastic neurons for conditional computation, arXiv preprint arXiv: 1308.3432, 2013.

Google Scholar

[18]

Z. Liu, B. Wu, W. Luo, X. Yang, W. Liu, and K. -T. Cheng, Bi-real net: Enhancing the performance of 1-bit CNNs with improved representational capability and advanced training algorithm, in Proc. European Conference on Computer Vision (ECCV), Munich, Germany, 2018, pp. 747–763.

DOI Google Scholar

[19]

S. Darabi, M. Belbahri, M. Courbariaux, and V. P. Nia, BNN+: Improved binary network training, arXiv preprint arXiv: 1812.11800, 2018.

Google Scholar

[20]

C. Liu, W. Ding, X. Xia, B. Zhang, J. Gu, J. Liu, R. Ji, and D. Doermann, Circulant binary convolutional networks: Enhancing the performance of 1-bit DCNNs with circulant back propagation, in Proc. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 2686–2694.

DOI Google Scholar

[21]

H. Qin, R. Gong, X. Liu, M. Shen, Z. Wei, F. Yu, and J. Song, Forward and backward information retention for accurate binary neural networks, in Proc. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 2250–2259.

DOI Google Scholar

[22]

F. Lahoud, R. Achanta, P. Márquez-Neila, and S. Süsstrunk, Self-binarizing networks, arXiv preprint arXiv: 1902.00730, 2019.

Google Scholar

[23]

Z. Liu, Z. Shen, M. Savvides, and K. -T. Cheng, ReActNet: Towards precise binary neural network with generalized activation functions, in Proc. European Conference on Computer Vision, Glasgow, UK, 2020, pp. 143–159.

DOI Google Scholar

[24]

P. Ramachandran, B. Zoph, and Q. V. Le, Searching for activation functions, arXiv preprint arXiv: 1710.05941, 2017.

Google Scholar

[25]

H. Liu, A. Brock, K. Simonyan, and Q. V. Le, Evolving normalization-activation layers, arXiv preprint arXiv: 2004.02967, 2020.

Google Scholar

[26]

H. Liu, K. Simonyan, and Y. Yang, DARTS: Differentiable architecture search, arXiv preprint arXiv: 1806.09055, 2018.

Google Scholar

[27]

S. Han, H. Mao, and W. J. Dally, Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding, arXiv preprint arXiv: 1510.00149, 2015.

Google Scholar

[28]

D. Blalock, J. J. G. Ortiz, J. Frankle, and J. Guttag, What is the state of neural network pruning? arXiv preprint arXiv: 2003.03033, 2020.

Google Scholar

[29]

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proc. 32^nd International Conference on Machine Learning, Lile, France, 2015, pp. 448–456.

Google Scholar

[30]

D. E. Goldberg and J. H. Holland, Genetic algorithms and machine learning, Machine Learning, vol. 3, pp. 95–99, 1988.

DOI Google Scholar

[31]

S. Katoch, S. S. Chauhan, and V. Kumar, A review on genetic algorithm: Past, present, and future, Multimedia Tools and Applications, vol. 80, pp. 8091–8126, 2020.

DOI Google Scholar

[32]

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv: 1412.6980, 2014.

Google Scholar

[33]

W. Tang, G. Hua, and L. Wang, How to train a compact binary neural network with high accuracy? in Proc. Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 2017, pp. 2625–2631.

DOI Google Scholar

[34]

S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou, DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients, arXiv preprint arXiv: 1606.06160, 2016.

Google Scholar

[35]

X. Lin, C. Zhao, and W. Pan, Towards accurate binary convolutional neural network, in Proc. Advances in Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 345–353.

Google Scholar

[36]

M. Ghasemzadeh, M. Samragh, and F. Koushanfar, ReBNet: Residual binarized neural network, in Proc. 2018 IEEE 26^th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Boulder, CO, USA, 2018, pp. 57–64.

DOI Google Scholar

[37]

J. Bethge, H. Yang, M. Bornstein, and C. Meinel, BinaryDenseNet: Developing an architecture for binary neural networks, in Proc. IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 2019, pp. 1951–1960.

DOI Google Scholar

[38]

M. Alizadeh, J. Fernández-Marqués, N. D. Lane, and Y. Gal, An empirical study of binary neural networks’ optimization, presented at 7^th International Conference on Learning Representations, New Orleans, LA, USA, 2018.

Google Scholar

[39]

Hou

, Yao

, and Kwok

J. T.

, Loss-aware binarization of deep networks, arXiv preprint arXiv: 1611.01600, 2016.

Google Scholar

[40]

K. Helwegen, J. Widdicombe, L. Geiger, Z. Liu, K. -T. Cheng, and R. Nusselder, Latent weights do not exist: Rethinking binarized neural network optimization, in Proc. Advances in Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 7531–7542.

Google Scholar

[41]

T. Simons and D. -J. Lee, A review of binarized neural networks, Electronics, vol. 8, no. 6, p. 661, 2019.

DOI Google Scholar

[42]

H. Qin, R. Gong, X. Liu, X. Bai, J. Song, and N. Sebe, Binary neural networks: A survey, Pattern Recognition, vol. 105, p. 107281, 2020.

DOI Google Scholar

[43]

K. Swersky, D. Duvenaud, J. Snoek, F. Hutter, and M. A. Osborne, Raiders of the lost architecture: Kernels for bayesian optimization in conditional parameter spaces, arXiv preprint arXiv: 1409.4011, 2014.

Google Scholar

[44]

K. Kandasamy, W. Neiswanger, J. Schneider, B. Poczos, and E. Xing, Neural architecture search with Bayesian optimisation and optimal transport, arXiv preprint arXiv: 1802.07191, 2018.

Google Scholar

[45]

E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le, and A. Kurakin, Large-scale evolution of image classifiers, in Proc. 34^th International Conference on Machine Learning, Sydney, Australia, 2017, pp. 2902–2911.

Google Scholar

[46]

T. Elsken, J. H. Metzen, and F. Hutter, Efficient multi-objective neural architecture search via Lamarckian evolution, arXiv preprint arXiv: 1804.09081, 2018.

Google Scholar

[47]

E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, Aging evolution for image classifier architecture search, presented at AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 2019.

DOI

[48]

B. Baker, O. Gupta, N. Naik, and R. Raskar, Designing neural network architectures using reinforcement learning, arXiv preprint arXiv: 1611.02167, 2016.

Google Scholar

[49]

Z. Zhong, J. Yan, W. Wu, J. Shao, and C. -L. Liu, Practical block-wise neural network architecture generation, in Proc. 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 2423–2432.

DOI Google Scholar

[50]

B. Zoph and Q. V. Le, Neural architecture search with reinforcement learning, arXiv preprint arXiv: 1611.01578, 2016.

Google Scholar

[51]

S. Xie, H. Zheng, C. Liu, and L. Lin, SNAS: Stochastic neural architecture search, arXiv preprint arXiv: 1812.09926, 2018.

Google Scholar

[52]

H. Cai, L. Zhu, and S. Han, ProxylessNAS: Direct neural architecture search on target task and hardware, arXiv preprint arXiv: 1812.00332, 2018.

Google Scholar

About this article

Publication history

Rights and permissions

Publication history

Received: 15 October 2021

Accepted: 01 November 2021

Published: 21 July 2022

Issue date: February 2023

Copyright

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).