References(37)
[1]
M. K. Chowdary, T. N. Nguyen, and D. J. Hemanth, Deep learning-based facial emotion recognition for human-computer interaction applications, Neural Comput. Appl., pp. 1–18, 2021.
[3]
W. Hua, F. Dai, L. Huang, J. Xiong, and G. Gui, HERO: Human emotions recognition for realizing intelligent Internet of Things, IEEE Access, vol. 7, pp. 24321–24332, 2019.
[4]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, Attention is all You need, in Proc. 31st Int. Conf. Neural Information Processing Systems, Long Beach, California, USA, 2017, pp. 6000–6010.
[5]
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2022, pp. 9992–10002.
[6]
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770–778.
[7]
S. Schneider, A. Baevski, R. Collobert, and M. Auli, wav2vec: unsupervised pre-training for speech recognition, arXiv preprint arXiv: 1904.05862, 2019.
[8]
Z. Wang, C. Li, and X. Wang, Convolutional neural network pruning with structural redundancy reduction, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 14908–14917.
[9]
Y. LeCun, J. Denker, and S. Solla, Optimal brain damage, in Proc. 2nd Int. Conf. Neural Information Processing Systems, Denver, CO, USA, 1989, pp. 598–605.
[10]
C. Louizos, M. Welling, and D. P. Kingma, Learning sparse neural networks through L0 regularization, arXiv preprint arXiv: 1712.01312, 2017.
[11]
Z. Liu, H. Mu, X. Zhang, Z. Guo, X. Yang, K. -T. Cheng, and J. Sun, MetaPruning: meta learning for automatic neural network channel pruning, in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, Republic of Korea, 2020, pp. 3295–3304.
[12]
J. S. McCarley, R. Chakravarti, and A. Sil, Structured pruning of a BERT-based question answering model, arXiv preprint arXiv: 1910.06360, 2019.
[13]
F. Lagunas, E. Charlaix, V. Sanh, and A. M. Rush, Block pruning for faster transformers, arXiv preprint arXiv: 2109.04838, 2021.
[14]
M. Xia, Z. Zhong, and D. Chen, Structured pruning learns compact and accurate models, in Proc. 60th Annu. Meeting Association for Computational Linguistics, Dublin, Ireland, 2022, pp. 1513–1528.
[15]
S. Narang, E. Elsen, G. Diamos, and S. Sengupta, Exploring sparsity in recurrent neural networks, arXiv preprint arXiv: 1704.05119, 2017.
[16]
Y. Wang, L. Wang, V. Li, and Z. Tu, On the sparsity of neural machine translation models, arXiv preprint arXiv: 2010.02646, 2020.
[17]
V. Sanh, T. Wolf, and A. M. Rush, Movement pruning: Adaptive sparsity by fine-tuning, arXiv preprint arXiv: 2005.07683, 2020.
[18]
D. Guo, A. Rush, and Y. Kim, Parameter-efficient transfer learning with diff pruning, in Proc. 59th Annu. Meeting Association for Computational Linguistics, 11th Int. Joint Conf. Natural Language Processing, virtual, 2021, pp. 4884–4896.
[19]
I. Hubara, Y. Nahshan, Y. Hanani, R. Banner, and D. Soudry, Accurate post training quantization with small calibration sets, in Proc. 38th Int. Conf. Machine Learning, virtual, 2021, pp. 4466–4475.
[20]
J. Yang, X. Shen, J. Xing, X. Tian, H. Li, B. Deng, J. Huang, and X. -S. Hua, Quantization networks, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2020, pp. 7300–7308.
[21]
B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, Quantization and training of neural networks for efficient integer-arithmetic-only inference, arXiv preprint arXiv: 1712.05877, 2017.
[22]
C. Sakr, S. Dai, R. Venkatesan, B. Zimmer, W. Dally, and B. Khailany, Optimal clippingand magnitude-aware differentiation for improved quantization-aware training, in Proc. 39th Int. Conf. Machine Learning, Baltimore, MD, USA, pp. 19123–19138.
[23]
M. Nagel, M. Fournarakis, Y. Bondarenko, and T. Blankevoort, Overcoming oscillations inquantization-aware training, in Proc. 39th Int. Conf. Machine Learning, Baltimore, MD, USA, pp. 16318–16330.
[24]
Y. Wang, Y. Lu, and T. Blankevoort, Differentiable joint pruning and quantization for hardware efficiency, in Proc. 16th European Conf. Computer Vision (ECCV), Glasgow, UK, 2020, 259–277.
[25]
F. Tung and G. Mori, CLIP-Q: Deep network compression learning by in-parallel pruning-quantization, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 7873–7882.
[26]
P. Hu, X. Peng, H. Zhu, M. M. S. Aly, and J. Lin, OPQ: Compressing deep neural networks with one-shot pruning-quantization, arXiv preprint arXiv: 2205.11141, 2022.
[27]
S. Ye, T. Zhang, K. Zhang, J. Li, J. Xie, Y. Liang, S. Liu, X. Lin, and Y. Wang, A unified framework of DNN weight pruning and weight clustering/quantization using ADMM, arXiv preprint arXiv: 1811.01907, 2018.
[28]
J. Kim, K. Yoo, and N. Kwak, Position-based scaled gradient for model quantization and pruning, arXiv preprint arXiv: 2005.11035, 2020.
[29]
L. Guerra, B. Zhuang, I. Reid, and T. Drummond, Automatic pruning for quantized neural networks, arXiv preprint arXiv: 2002.00523, 2020.
[30]
H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, Pruning filters for efficient ConvNets, arXiv preprint arXiv: 1608.08710, 2016.
[31]
X. Liu, J. Pool, S. Han, and W. J. Dally, Efficient sparse-winograd convolutional neural networks, arXiv preprint arXiv: 1802.06367, 2018.
[32]
M. Nagel, M. van Baalen, T. Blankevoort, and M. Welling, Data-free quantization through weight equalization and bias correction, arXiv preprint arXiv: 1906.04721, 2019.
[33]
Y. Choukroun, E. Kravchik, F. Yang, and P. Kisilev, Low-bit quantization of neural networks for efficient inference, arXiv preprint arXiv: 1902.06822, 2019.
[34]
S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou, DoReFa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients, arXiv preprint arXiv: 1606.06160, 2016.
[35]
Q. Jin, J. Ren, R. Zhuang, S. Hanumante, Z. Li, Z. Chen, Y. Wang, K. Yang, and S. Tulyakov, F8Net: fixed-point 8-bit only multiplication for network quantization, arXiv preprint arXiv: 2202.05239, 2022.
[36]
J. Choi, Z. Wang, S. Venkataramani, P. I. J. Chuang, V. Srinivasan, and K. Gopalakrishnan, Pact: Parameterized clipping activation for quantized neural networks, arXiv preprint arXiv: 1805.06085, 2018.
[37]
Q. Zhang, Z. Han, F. Yang, Y. Zhang, Z. Liu, M. Yang, and L. Zhou, “Retiarii: A deep learning exploratory-training framework, in Proc. 14th USENIX Symp. Operating Systems Design and Implementation (OSDI’20), virtual, 2020, pp. 919–936.