AIPerf: Automated Machine Learning as an AI-HPC Benchmark

Zhixiang Ren; Yongheng Liu; Tianhui Shi; Lei Xie; Yue Zhou; Jidong Zhai; Youhui Zhang; Yunquan Zhang; Wenguang Chen

doi:10.26599/BDMA.2021.9020004

Big Data Mining and Analytics 2021, 4(3): 208-220 https://doi.org/10.26599/BDMA.2021.9020004

Open Access | Issue | Published: 12 May 2021

AIPerf: Automated Machine Learning as an AI-HPC Benchmark

Show Author's Information Hide Author's Information Zhixiang Ren^¹(

), Yongheng Liu^¹, Tianhui Shi^², Lei Xie^², Yue Zhou^¹, Jidong Zhai^², Youhui Zhang^², Yunquan Zhang^³, Wenguang Chen^²(

)

Peng Cheng National Laboratory, Shenzhen 518000, China

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100086, China

Keywords:

Artificial Intelligence (AI), High-Performance Computing (HPC), automated machine learning

Cite this article:

Ren Z, Liu Y, Shi T, et al. AIPerf: Automated Machine Learning as an AI-HPC Benchmark. Big Data Mining and Analytics, 2021, 4(3): 208-220. https://doi.org/10.26599/BDMA.2021.9020004

Download citation

EndNote(RIS)

BibTeX

1024

Views

Downloads

Citations

Crossref

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

The plethora of complex Artificial Intelligence (AI) algorithms and available High-Performance Computing (HPC) power stimulates the expeditious development of AI components with heterogeneous designs. Consequently, the need for cross-stack performance benchmarking of AI-HPC systems has rapidly emerged. In particular, the de facto HPC benchmark, LINPACK, cannot reflect the AI computing power and input/output performance without a representative workload. Current popular AI benchmarks, such as MLPerf, have a fixed problem size and therefore limited scalability. To address these issues, we propose an end-to-end benchmark suite utilizing automated machine learning, which not only represents real AI scenarios, but also is auto-adaptively scalable to various scales of machines. We implement the algorithms in a highly parallel and flexible way to ensure the efficiency and optimization potential on diverse systems with customizable configurations. We utilize Operations Per Second (OPS), which is measured in an analytical and systematic approach, as a major metric to quantify the AI performance. We perform evaluations on various systems to ensure the benchmark’s stability and scalability, from 4 nodes with 32 NVIDIA Tesla T4 (56.1 Tera-OPS measured) up to 512 nodes with 4096 Huawei Ascend 910 (194.53 Peta-OPS measured), and the results show near-linear weak scalability. With a flexible workload and single metric, AIPerf can easily scale on and rank AI-HPC, providing a powerful benchmark suite for the coming supercomputing era.

Full text

Abstract

Full text

Outline

About this article

AIPerf: Automated Machine Learning as an AI-HPC Benchmark

Show Author's information Hide Author's Information Zhixiang Ren^¹(

), Yongheng Liu^¹, Tianhui Shi^², Lei Xie^², Yue Zhou^¹, Jidong Zhai^², Youhui Zhang^², Yunquan Zhang^³, Wenguang Chen^²(

)

Peng Cheng National Laboratory, Shenzhen 518000, China

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100086, China

Abstract

Keywords: Artificial Intelligence (AI), High-Performance Computing (HPC), automated machine learning

References(60)

[1]

Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol. 521, no. 7553, pp. 436-444, 2015.

DOI Google Scholar

[2]

G. M. Yi, V. Loia, High-performance computing systems and applications for AI, J. Supercomput., vol. 75, no. 8, pp. 4248-4251, 2019.

DOI Google Scholar

[3]

E. A. Huerta, A. Khan, E. Davis, C. Bushell, W. D. Gropp, D. S. Katz, V. Kindratenko, S. Koric, W. T. C. Kramer, B. McGinty, et al., Convergence of artificial intelligence and high performance computing on NSF-supported cyberinfrastructure, J. Big Data, vol. 7, no. 1, p. 88, 2020.

DOI Google Scholar

[4]

G. C. Fox, Perspectives on high-performance computing in a big data world, in Proc. 28th Int. Symp. High-Performance Parallel and Distributed Computing, Phoenix, AZ, USA, 2019, pp. 145-145.

DOI

[5]

D. E. Womble, M. Shankar, W. Joubert, J. T. Johnston, J. C. Wells, and J. A. Nichols, Early experiences on summit: Data analytics and AI applications, J. Reprod. Dev., vol. 63, no. 6, pp. 2:1-2:9, 2019.

DOI Google Scholar

[6]

P. Mattson, C. Cheng, C. Coleman, G. Diamos, P. Micikevicius, D. Patterson, H. L. Tang, G. Y. Wei, P. Bailis, V. Bittorf, et al., MLPerf training benchmark, arXiv preprint arXiv: 1910.01500, 2019.

Google Scholar

[7]

J. L. Gustafson and Q. O. Snell, HINT: A new way to measure computer performance, in Proc. 28th Annu. Hawaii Int. Conf. System Sciences, Wailea, HI, USA, 1995, pp. 392-401.

[8]

E. Carson, N. J. Higham, Accelerating the solution of linear systems by iterative refinement in three precisions, SIAM J. Sci. Comput., vol. 40, no. 2, pp. A817-A847, 2018.

DOI Google Scholar

[9]

J. Gustafson, D. Rover, S. Elbert, and M. Carter, The first scalable supercomputer benchmark, Supercomputing Review, pp. 56-61, 1990.

Google Scholar

[10]

Baidu, Deepbench, https://github.com/baidu-research/DeepBench, 2020.

[11]

S. Dong and D. Kaeli, DNNMark: A deep neural network benchmark suite for GPUs, in Proc. 2017 General Purpose GPUs, Austin, TX, USA, 2017, pp. 63-72.

DOI

[12]

A. Ignatov, R. Timofte, W. Chou, K. Wang, M. Wu, T. Hartley, and L. Van Gool, AI benchmark: Running deep neural networks on android smartphones, in Proc. Computer Vision - ECCV 2018 Workshops, Munich, Germany, 2018, pp. 288-314.

DOI

[13]

J. H. Tao, Z. D. Du, Q. Guo, H. Y. Lan, L. Zhang, S. Y. Zhou, L. J. Xu, C. Liu, H. F. Liu, S. Tang, et al., BENCHIP: Benchmarking intelligence processors, J. Comput. Sci. Technol., vol. 33, no. 1, pp. 1-23, 2018.

DOI Google Scholar

[14]

Y. X. Wang, Q. Wang, S. H. Shi, X. He, Z. H. Tang, K. Y. Zhao, and X. W. Chu, Benchmarking the performance and energy efficiency of AI accelerators for AI training, arXiv preprint arXiv: 1909.06842, 2019.

Google Scholar

[15]

HPE, Deep learning benchmarking suite, https://github.com/HewlettPackard/dlcookbook-dlbs, 2020.

[16]

R. Adolf, S. Rama, B. Reagen, G. Y. Wei, and D. Brooks, Fathom: Reference workloads for modern deep learning methods, in Proc. 2016 IEEE Int. Symp. Workload Characterization (IISWC), Providence, RI, USA, 2016, pp. 1-10.

DOI

[17]

AIIA-DNN-benchmark, https://github.com/AIIABenchmark/AIIA-DNN-benchmark, 2021.

[18]

W. L. Gao, F. Tang, L. Wang, J. F. Zhan, C. X. Lan, C. J. Luo, Y. Y. Huang, C. Zheng, J. H. Dai, Z. Cao, et al., AIBench: An industry standard internet service AI benchmark suite, arXiv preprint arXiv: 1908.08998, 2019.

Google Scholar

[19]

W. Zhang, W. Wei, L. J. Xu, L. L. Jin, and C. Li, AI matrix: A deep learning benchmark for Alibaba data centers, arXiv preprint arXiv: 1909.10562, 2019.

Google Scholar

[20]

T. Ben-Nun, M. Besta, S. Huber, A. N. Ziogas, D. Peter, and T. Hoefler, A modular benchmarking infrastructure for high-performance and reproducible deep learning, in Proc. 2019 IEEE Int. Parallel and Distributed Processing Symp. (IPDPS), Rio de Janeiro, Brazil, 2019, pp. 66-77.

DOI

[21]

C. Coleman, D. Narayanan, D. Kang, T. Zhao, J. Zhang, L. Nardi, P. Bailis, K. Olukotun, C. Ré, and M. Zaharia, Dawnbench: An end-to-end deep learning benchmark and competition, Training, vol. 100, no. 101, p. 102, 2017.

Google Scholar

[22]

Z. H. Jiang, L. Wang, X. W. Xiong, W. L. Gao, C. J. Luo, F. Tang, C. X. Lan, H. X. Li, and J. F. Zhan, HPC AI500: The methodology, tools, roofline performance models, and metrics for benchmarking HPC AI systems, arXiv preprint arXiv: 2007.00279, 2020.

Google Scholar

[23]

H. Y. Zhu, M. Akrout, B. J. Zheng, A. Pelegris, A. Phanishayee, B. Schroeder, and G. Pekhimenko, TBD: Benchmarking and analyzing deep neural network training, arXiv preprint arXiv: 1803.06905, 2018.

Google Scholar

[24]

F. Hutter, L. Kotthoff, and J. Vanschoren, Automated Machine Learning: Methods, Systems, Challenges. Springer, 2019.

DOI

[25]

X. He, K. Zhao, and X. Chu, Automl: A survey of the state-of-the-art, Knowl.-Based Syst., vol. 212, p. 106622, 2021.

DOI Google Scholar

[26]

T. Elsken, J. H. Metzen, and F. Hutter, Neural architecture search: A survey, arXiv preprint arXiv: 1808.05377, 2018.

Google Scholar

[27]

J. Bergstra, Y. Bengio, Random search for hyper-parameter optimization, J. Mach. Learn. Res., vol. 13, no. 1, pp. 281-305, 2012.

Google Scholar

[28]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in python, J. Mach. Learn. Res., vol. 12, pp. 2825-2830, 2011.

Google Scholar

[29]

Y. I. Bengio, J. Goodfellow, and A. Courville, Deep Learning. Cambridge, MA, USA: MIT Press, 2016.

[30]

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. F. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al., Tensorflow: Large-scale machine learning on heterogeneous distributed systems, arXiv preprint arXiv: 1603.04467, 2016.

Google Scholar

[31]

F. Chollet, Keras: The python deep learning library, Astrophysics Source Code Library, .

DOI Google Scholar

[32]

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. M. Lin, A. Desmaison, L. Antiga, and A. Lerer, Automatic differentiation in pytorch, in Proc. 31st Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 1-4.

[33]

G. Nguyen, S. Dlugolinsky, M. Bobák, V. Tran, Á. L. García, I. Heredia, P. Malík, and L. Hluchý, Machine learning and deep learning frameworks and libraries for large-scale data mining: A survey, Artif. Intell. Rev., vol. 52, no. 1, pp. 77-124, 2019.

DOI Google Scholar

[34]

Z. B. Wang, K. Liu, J. Li, Y. Zhu, and Y. N. Zhang, Various frameworks and libraries of machine learning and deep learning: A survey, Arch. Comput. Methods Eng., .

DOI Google Scholar

[35]

A. Truong, A. Walters, J. Goodsitt, K. Hines, C. B. Bruss, and R. Farivar, Towards automated machine learning: Evaluation and comparison of AutoML approaches and tools, arXiv preprint arXiv: 1908.05557, 2019.

Google Scholar

[36]

M. A. Zöller and M. F. Huber, Survey on automated machine learning, arXiv preprint arXiv: 1904.12054, 2019.

Google Scholar

[37]

K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770-778.

DOI

[38]

A. G. Howard, M. L. Zhu, B. Chen, D. Kalenichenko, W. J. Wang, T. Weyand, M. Andreetto, and H. Adam, Mobilenets: Efficient convolutional neural networks for mobile vision applications, arXiv preprint arXiv: 1704.04861, 2017.

Google Scholar

[39]

J. Hu, L. Shen, and G. Sun, Squeeze-and-excitation networks, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 7132-7141, 2018.

DOI

[40]

G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, Densely connected convolutional networks, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 4700-4708.

DOI

[41]

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv: 1409.1556, 2014.

Google Scholar

[42]

C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, in Proc. 2015 IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 1-9.

DOI

[43]

B. Zoph and Q. V. Le, Neural architecture search with reinforcement learning, arXiv preprint arXiv: 1611.01578, 2016.

Google Scholar

[44]

E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le, and A. Kurakin, Large-scale evolution of image classifiers, in Proc.34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 2902-2911.

[45]

H. Mendoza, A. Klein, M. Feurer, J. T. Springenberg, and F. Hutter, Towards automatically-tuned neural networks, in Proc. 2016 Workshop on Automatic Machine Learning, New York City, NY, USA, 2016, pp. 58-65.

[46]

H. X. Liu, K. Simonyan, and Y. M. Yang, DARTS: Differentiable architecture search, arXiv preprint arXiv: 1806.09055, 2018.

Google Scholar

[47]

T. Q. Chen, I. Goodfellow, and J. Shlens, Net2Net: Accelerating learning via knowledge transfer, arXiv preprint arXiv: 1511.05641, 2015.

Google Scholar

[48]

T. Wei, C. H. Wang, Y. Rui, and C. W. Chen, Network morphism, in Proc. 33rd Int. Conf. Machine Learning, New York City, NY, USA, 2016, pp. 564-572.

[49]

H. F. Jin, Q. Q. Song, and X. Hu, Auto-Keras: An efficient neural architecture search system, in Proc. 25th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, Anchorage, AK, USA, 2019, pp. 1946-1956.

DOI

[50]

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 448-456.

[51]

H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio, An empirical evaluation of deep architectures on problems with many factors of variation, in Proc. 24th Int. Conf. Machine Learning, Corvalis, OR, USA, 2007, pp. 473-480.

DOI

[52]

J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, Algorithms for hyper-parameter optimization, in Proc. 25th Annu. Conf. Neural Information Processing Systems, Granada, Spain, 2011, pp. 2546-2554.

[53]

N. Qian, On the momentum term in gradient descent learning algorithms, Neural Netw., vol. 12, no. 1, pp. 145-151, 1999.

Google Scholar

[54]

D. Peteiro-Barral, B. Guijarro-Berdiñas, A survey of methods for distributed machine learning, Prog. Artif. Intell., vol. 2, no. 1, pp. 1-11, 2013.

Google Scholar

[55]

R. Hecht-Nielsen, Theory of the backpropagation neural network, in Neural Networks for Perception: Computation, Learning, and Architectures, H. Wechsler, ed. Amsterdam, the Netherland: Elsevier, 1992, pp. 65-93.

[56]

A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, Automatic differentiation in machine learning: A survey, J. Mach. Learn. Res., vol. 18, no. 1, pp. 5595-5637, 2017.

Google Scholar

[57]

J. E. Huss and J. A. Pennline, A Comparison of Five Benchmarks. Cleveland, OH, USA: National Aeronautics and Space Administration, 1987.

[58]

J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and F. F. Li, ImageNet: A large-scale hierarchical image database, in Proc. 2009 IEEE Conf. Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, pp. 248-255.

[59]

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, et al., The NAS parallel benchmarks, Int.J. High Perform. Comput. Appl., vol. 5, no. 3, pp. 63-73, 1991.

Google Scholar

[60]

K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification, in Proc. 2015 IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 1026-1034.

About this article

Publication history

Rights and permissions

Publication history

Received: 01 March 2021

Accepted: 12 March 2021

Published: 12 May 2021

Issue date: September 2021

Copyright

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).