Journal Home > Volume 4 , issue 3

The plethora of complex Artificial Intelligence (AI) algorithms and available High-Performance Computing (HPC) power stimulates the expeditious development of AI components with heterogeneous designs. Consequently, the need for cross-stack performance benchmarking of AI-HPC systems has rapidly emerged. In particular, the de facto HPC benchmark, LINPACK, cannot reflect the AI computing power and input/output performance without a representative workload. Current popular AI benchmarks, such as MLPerf, have a fixed problem size and therefore limited scalability. To address these issues, we propose an end-to-end benchmark suite utilizing automated machine learning, which not only represents real AI scenarios, but also is auto-adaptively scalable to various scales of machines. We implement the algorithms in a highly parallel and flexible way to ensure the efficiency and optimization potential on diverse systems with customizable configurations. We utilize Operations Per Second (OPS), which is measured in an analytical and systematic approach, as a major metric to quantify the AI performance. We perform evaluations on various systems to ensure the benchmark’s stability and scalability, from 4 nodes with 32 NVIDIA Tesla T4 (56.1 Tera-OPS measured) up to 512 nodes with 4096 Huawei Ascend 910 (194.53 Peta-OPS measured), and the results show near-linear weak scalability. With a flexible workload and single metric, AIPerf can easily scale on and rank AI-HPC, providing a powerful benchmark suite for the coming supercomputing era.


menu
Abstract
Full text
Outline
About this article

AIPerf: Automated Machine Learning as an AI-HPC Benchmark

Show Author's information Zhixiang Ren1( )Yongheng Liu1Tianhui Shi2Lei Xie2Yue Zhou1Jidong Zhai2Youhui Zhang2Yunquan Zhang3Wenguang Chen2( )
Peng Cheng National Laboratory, Shenzhen 518000, China
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100086, China

Abstract

The plethora of complex Artificial Intelligence (AI) algorithms and available High-Performance Computing (HPC) power stimulates the expeditious development of AI components with heterogeneous designs. Consequently, the need for cross-stack performance benchmarking of AI-HPC systems has rapidly emerged. In particular, the de facto HPC benchmark, LINPACK, cannot reflect the AI computing power and input/output performance without a representative workload. Current popular AI benchmarks, such as MLPerf, have a fixed problem size and therefore limited scalability. To address these issues, we propose an end-to-end benchmark suite utilizing automated machine learning, which not only represents real AI scenarios, but also is auto-adaptively scalable to various scales of machines. We implement the algorithms in a highly parallel and flexible way to ensure the efficiency and optimization potential on diverse systems with customizable configurations. We utilize Operations Per Second (OPS), which is measured in an analytical and systematic approach, as a major metric to quantify the AI performance. We perform evaluations on various systems to ensure the benchmark’s stability and scalability, from 4 nodes with 32 NVIDIA Tesla T4 (56.1 Tera-OPS measured) up to 512 nodes with 4096 Huawei Ascend 910 (194.53 Peta-OPS measured), and the results show near-linear weak scalability. With a flexible workload and single metric, AIPerf can easily scale on and rank AI-HPC, providing a powerful benchmark suite for the coming supercomputing era.

Keywords:

High-Performance Computing (HPC), Artificial Intelligence (AI), automated machine learning
Received: 01 March 2021 Accepted: 12 March 2021 Published: 12 May 2021 Issue date: September 2021
References(60)
[1]
Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol. 521, no. 7553, pp. 436-444, 2015.
[2]
G. M. Yi, V. Loia, High-performance computing systems and applications for AI, J. Supercomput., vol. 75, no. 8, pp. 4248-4251, 2019.
[3]
E. A. Huerta, A. Khan, E. Davis, C. Bushell, W. D. Gropp, D. S. Katz, V. Kindratenko, S. Koric, W. T. C. Kramer, B. McGinty, et al., Convergence of artificial intelligence and high performance computing on NSF-supported cyberinfrastructure, J. Big Data, vol. 7, no. 1, p. 88, 2020.
[4]
G. C. Fox, Perspectives on high-performance computing in a big data world, in Proc. 28th Int. Symp. High-Performance Parallel and Distributed Computing, Phoenix, AZ, USA, 2019, pp. 145-145.
[5]
D. E. Womble, M. Shankar, W. Joubert, J. T. Johnston, J. C. Wells, and J. A. Nichols, Early experiences on summit: Data analytics and AI applications, J. Reprod. Dev., vol. 63, no. 6, pp. 2:1-2:9, 2019.
[6]
P. Mattson, C. Cheng, C. Coleman, G. Diamos, P. Micikevicius, D. Patterson, H. L. Tang, G. Y. Wei, P. Bailis, V. Bittorf, et al., MLPerf training benchmark, arXiv preprint arXiv: 1910.01500, 2019.
[7]
J. L. Gustafson and Q. O. Snell, HINT: A new way to measure computer performance, in Proc. 28th Annu. Hawaii Int. Conf. System Sciences, Wailea, HI, USA, 1995, pp. 392-401.
[8]
E. Carson, N. J. Higham, Accelerating the solution of linear systems by iterative refinement in three precisions, SIAM J. Sci. Comput., vol. 40, no. 2, pp. A817-A847, 2018.
[9]
J. Gustafson, D. Rover, S. Elbert, and M. Carter, The first scalable supercomputer benchmark, Supercomputing Review, pp. 56-61, 1990.
[10]
Baidu, Deepbench, , 2020.
[11]
S. Dong and D. Kaeli, DNNMark: A deep neural network benchmark suite for GPUs, in Proc. 2017 General Purpose GPUs, Austin, TX, USA, 2017, pp. 63-72.
[12]
A. Ignatov, R. Timofte, W. Chou, K. Wang, M. Wu, T. Hartley, and L. Van Gool, AI benchmark: Running deep neural networks on android smartphones, in Proc. Computer Vision - ECCV 2018 Workshops, Munich, Germany, 2018, pp. 288-314.
[13]
J. H. Tao, Z. D. Du, Q. Guo, H. Y. Lan, L. Zhang, S. Y. Zhou, L. J. Xu, C. Liu, H. F. Liu, S. Tang, et al., BENCHIP: Benchmarking intelligence processors, J. Comput. Sci. Technol., vol. 33, no. 1, pp. 1-23, 2018.
[14]
Y. X. Wang, Q. Wang, S. H. Shi, X. He, Z. H. Tang, K. Y. Zhao, and X. W. Chu, Benchmarking the performance and energy efficiency of AI accelerators for AI training, arXiv preprint arXiv: 1909.06842, 2019.
[15]
HPE, Deep learning benchmarking suite, , 2020.
[16]
R. Adolf, S. Rama, B. Reagen, G. Y. Wei, and D. Brooks, Fathom: Reference workloads for modern deep learning methods, in Proc. 2016 IEEE Int. Symp. Workload Characterization (IISWC), Providence, RI, USA, 2016, pp. 1-10.
[17]
AIIA-DNN-benchmark, , 2021.
[18]
W. L. Gao, F. Tang, L. Wang, J. F. Zhan, C. X. Lan, C. J. Luo, Y. Y. Huang, C. Zheng, J. H. Dai, Z. Cao, et al., AIBench: An industry standard internet service AI benchmark suite, arXiv preprint arXiv: 1908.08998, 2019.
[19]
W. Zhang, W. Wei, L. J. Xu, L. L. Jin, and C. Li, AI matrix: A deep learning benchmark for Alibaba data centers, arXiv preprint arXiv: 1909.10562, 2019.
[20]
T. Ben-Nun, M. Besta, S. Huber, A. N. Ziogas, D. Peter, and T. Hoefler, A modular benchmarking infrastructure for high-performance and reproducible deep learning, in Proc. 2019 IEEE Int. Parallel and Distributed Processing Symp. (IPDPS), Rio de Janeiro, Brazil, 2019, pp. 66-77.
[21]
C. Coleman, D. Narayanan, D. Kang, T. Zhao, J. Zhang, L. Nardi, P. Bailis, K. Olukotun, C. Ré, and M. Zaharia, Dawnbench: An end-to-end deep learning benchmark and competition, Training, vol. 100, no. 101, p. 102, 2017.
[22]
Z. H. Jiang, L. Wang, X. W. Xiong, W. L. Gao, C. J. Luo, F. Tang, C. X. Lan, H. X. Li, and J. F. Zhan, HPC AI500: The methodology, tools, roofline performance models, and metrics for benchmarking HPC AI systems, arXiv preprint arXiv: 2007.00279, 2020.
[23]
H. Y. Zhu, M. Akrout, B. J. Zheng, A. Pelegris, A. Phanishayee, B. Schroeder, and G. Pekhimenko, TBD: Benchmarking and analyzing deep neural network training, arXiv preprint arXiv: 1803.06905, 2018.
[24]
F. Hutter, L. Kotthoff, and J. Vanschoren, Automated Machine Learning: Methods, Systems, Challenges. Springer, 2019.
[25]
X. He, K. Zhao, and X. Chu, Automl: A survey of the state-of-the-art, Knowl.-Based Syst., vol. 212, p. 106622, 2021.
[26]
T. Elsken, J. H. Metzen, and F. Hutter, Neural architecture search: A survey, arXiv preprint arXiv: 1808.05377, 2018.
[27]
J. Bergstra, Y. Bengio, Random search for hyper-parameter optimization, J. Mach. Learn. Res., vol. 13, no. 1, pp. 281-305, 2012.
[28]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in python, J. Mach. Learn. Res., vol. 12, pp. 2825-2830, 2011.
[29]
Y. I. Bengio, J. Goodfellow, and A. Courville, Deep Learning. Cambridge, MA, USA: MIT Press, 2016.
[30]
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. F. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al., Tensorflow: Large-scale machine learning on heterogeneous distributed systems, arXiv preprint arXiv: 1603.04467, 2016.
[31]
F. Chollet, Keras: The python deep learning library, Astrophysics Source Code Library, .
[32]
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. M. Lin, A. Desmaison, L. Antiga, and A. Lerer, Automatic differentiation in pytorch, in Proc. 31st Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 1-4.
[33]
G. Nguyen, S. Dlugolinsky, M. Bobák, V. Tran, Á. L. García, I. Heredia, P. Malík, and L. Hluchý, Machine learning and deep learning frameworks and libraries for large-scale data mining: A survey, Artif. Intell. Rev., vol. 52, no. 1, pp. 77-124, 2019.
[34]
Z. B. Wang, K. Liu, J. Li, Y. Zhu, and Y. N. Zhang, Various frameworks and libraries of machine learning and deep learning: A survey, Arch. Comput. Methods Eng., .
[35]
A. Truong, A. Walters, J. Goodsitt, K. Hines, C. B. Bruss, and R. Farivar, Towards automated machine learning: Evaluation and comparison of AutoML approaches and tools, arXiv preprint arXiv: 1908.05557, 2019.
[36]
M. A. Zöller and M. F. Huber, Survey on automated machine learning, arXiv preprint arXiv: 1904.12054, 2019.
[37]
K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770-778.
[38]
A. G. Howard, M. L. Zhu, B. Chen, D. Kalenichenko, W. J. Wang, T. Weyand, M. Andreetto, and H. Adam, Mobilenets: Efficient convolutional neural networks for mobile vision applications, arXiv preprint arXiv: 1704.04861, 2017.
[39]
J. Hu, L. Shen, and G. Sun, Squeeze-and-excitation networks, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 7132-7141, 2018.
[40]
G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, Densely connected convolutional networks, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 4700-4708.
[41]
K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv: 1409.1556, 2014.
[42]
C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, in Proc. 2015 IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 1-9.
[43]
B. Zoph and Q. V. Le, Neural architecture search with reinforcement learning, arXiv preprint arXiv: 1611.01578, 2016.
[44]
E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le, and A. Kurakin, Large-scale evolution of image classifiers, in Proc.34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 2902-2911.
[45]
H. Mendoza, A. Klein, M. Feurer, J. T. Springenberg, and F. Hutter, Towards automatically-tuned neural networks, in Proc. 2016 Workshop on Automatic Machine Learning, New York City, NY, USA, 2016, pp. 58-65.
[46]
H. X. Liu, K. Simonyan, and Y. M. Yang, DARTS: Differentiable architecture search, arXiv preprint arXiv: 1806.09055, 2018.
[47]
T. Q. Chen, I. Goodfellow, and J. Shlens, Net2Net: Accelerating learning via knowledge transfer, arXiv preprint arXiv: 1511.05641, 2015.
[48]
T. Wei, C. H. Wang, Y. Rui, and C. W. Chen, Network morphism, in Proc. 33rd Int. Conf. Machine Learning, New York City, NY, USA, 2016, pp. 564-572.
[49]
H. F. Jin, Q. Q. Song, and X. Hu, Auto-Keras: An efficient neural architecture search system, in Proc. 25th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, Anchorage, AK, USA, 2019, pp. 1946-1956.
[50]
S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 448-456.
[51]
H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio, An empirical evaluation of deep architectures on problems with many factors of variation, in Proc. 24th Int. Conf. Machine Learning, Corvalis, OR, USA, 2007, pp. 473-480.
[52]
J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, Algorithms for hyper-parameter optimization, in Proc. 25th Annu. Conf. Neural Information Processing Systems, Granada, Spain, 2011, pp. 2546-2554.
[53]
N. Qian, On the momentum term in gradient descent learning algorithms, Neural Netw., vol. 12, no. 1, pp. 145-151, 1999.
[54]
D. Peteiro-Barral, B. Guijarro-Berdiñas, A survey of methods for distributed machine learning, Prog. Artif. Intell., vol. 2, no. 1, pp. 1-11, 2013.
[55]
R. Hecht-Nielsen, Theory of the backpropagation neural network, in Neural Networks for Perception: Computation, Learning, and Architectures, H. Wechsler, ed. Amsterdam, the Netherland: Elsevier, 1992, pp. 65-93.
[56]
A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, Automatic differentiation in machine learning: A survey, J. Mach. Learn. Res., vol. 18, no. 1, pp. 5595-5637, 2017.
[57]
J. E. Huss and J. A. Pennline, A Comparison of Five Benchmarks. Cleveland, OH, USA: National Aeronautics and Space Administration, 1987.
[58]
J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and F. F. Li, ImageNet: A large-scale hierarchical image database, in Proc. 2009 IEEE Conf. Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, pp. 248-255.
[59]
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, et al., The NAS parallel benchmarks, Int.J. High Perform. Comput. Appl., vol. 5, no. 3, pp. 63-73, 1991.
[60]
K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification, in Proc. 2015 IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 1026-1034.
Publication history
Copyright
Rights and permissions

Publication history

Received: 01 March 2021
Accepted: 12 March 2021
Published: 12 May 2021
Issue date: September 2021

Copyright

© The author(s) 2021

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Reprints and Permission requests may be sought directly from editorial office.

Return