AIPerf: Automated Machine Learning as an AI-HPC Benchmark

Zhixiang Ren; Yongheng Liu; Tianhui Shi; Lei Xie; Yue Zhou; Jidong Zhai; Youhui Zhang; Yunquan Zhang; Wenguang Chen

doi:10.26599/BDMA.2021.9020004

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (10.3 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

AIPerf: Automated Machine Learning as an AI-HPC Benchmark

Zhixiang Ren^¹(

), Yongheng Liu^¹, Tianhui Shi^², Lei Xie^², Yue Zhou^¹, Jidong Zhai^², Youhui Zhang^², Yunquan Zhang^³, Wenguang Chen^²(

)

Peng Cheng National Laboratory, Shenzhen 518000, China

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100086, China

Show Author Information

Abstract

The plethora of complex Artificial Intelligence (AI) algorithms and available High-Performance Computing (HPC) power stimulates the expeditious development of AI components with heterogeneous designs. Consequently, the need for cross-stack performance benchmarking of AI-HPC systems has rapidly emerged. In particular, the de facto HPC benchmark, LINPACK, cannot reflect the AI computing power and input/output performance without a representative workload. Current popular AI benchmarks, such as MLPerf, have a fixed problem size and therefore limited scalability. To address these issues, we propose an end-to-end benchmark suite utilizing automated machine learning, which not only represents real AI scenarios, but also is auto-adaptively scalable to various scales of machines. We implement the algorithms in a highly parallel and flexible way to ensure the efficiency and optimization potential on diverse systems with customizable configurations. We utilize Operations Per Second (OPS), which is measured in an analytical and systematic approach, as a major metric to quantify the AI performance. We perform evaluations on various systems to ensure the benchmark’s stability and scalability, from 4 nodes with 32 NVIDIA Tesla T4 (56.1 Tera-OPS measured) up to 512 nodes with 4096 Huawei Ascend 910 (194.53 Peta-OPS measured), and the results show near-linear weak scalability. With a flexible workload and single metric, AIPerf can easily scale on and rank AI-HPC, providing a powerful benchmark suite for the coming supercomputing era.

Keywords

Artificial Intelligence (AI)High-Performance Computing (HPC)automated machine learning

References

[1]

Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol. 521, no. 7553, pp. 436-444, 2015.

Crossref Google Scholar

[2]

G. M. Yi, V. Loia, High-performance computing systems and applications for AI, J. Supercomput., vol. 75, no. 8, pp. 4248-4251, 2019.

Crossref Google Scholar

[3]

E. A. Huerta, A. Khan, E. Davis, C. Bushell, W. D. Gropp, D. S. Katz, V. Kindratenko, S. Koric, W. T. C. Kramer, B. McGinty, et al., Convergence of artificial intelligence and high performance computing on NSF-supported cyberinfrastructure, J. Big Data, vol. 7, no. 1, p. 88, 2020.

Crossref Google Scholar

[4]

G. C. Fox, Perspectives on high-performance computing in a big data world, in Proc. 28th Int. Symp. High-Performance Parallel and Distributed Computing, Phoenix, AZ, USA, 2019, pp. 145-145.

Crossref

[5]

D. E. Womble, M. Shankar, W. Joubert, J. T. Johnston, J. C. Wells, and J. A. Nichols, Early experiences on summit: Data analytics and AI applications, J. Reprod. Dev., vol. 63, no. 6, pp. 2:1-2:9, 2019.

Crossref Google Scholar

[6]

P. Mattson, C. Cheng, C. Coleman, G. Diamos, P. Micikevicius, D. Patterson, H. L. Tang, G. Y. Wei, P. Bailis, V. Bittorf, et al., MLPerf training benchmark, arXiv preprint arXiv: 1910.01500, 2019.

Google Scholar

[7]

J. L. Gustafson and Q. O. Snell, HINT: A new way to measure computer performance, in Proc. 28th Annu. Hawaii Int. Conf. System Sciences, Wailea, HI, USA, 1995, pp. 392-401.

[8]

E. Carson, N. J. Higham, Accelerating the solution of linear systems by iterative refinement in three precisions, SIAM J. Sci. Comput., vol. 40, no. 2, pp. A817-A847, 2018.

Crossref Google Scholar

[9]

J. Gustafson, D. Rover, S. Elbert, and M. Carter, The first scalable supercomputer benchmark, Supercomputing Review, pp. 56-61, 1990.

Google Scholar

[10]

Baidu, Deepbench, https://github.com/baidu-research/DeepBench, 2020.

[11]

S. Dong and D. Kaeli, DNNMark: A deep neural network benchmark suite for GPUs, in Proc. 2017 General Purpose GPUs, Austin, TX, USA, 2017, pp. 63-72.

Crossref

[12]

A. Ignatov, R. Timofte, W. Chou, K. Wang, M. Wu, T. Hartley, and L. Van Gool, AI benchmark: Running deep neural networks on android smartphones, in Proc. Computer Vision - ECCV 2018 Workshops, Munich, Germany, 2018, pp. 288-314.

Crossref

[13]

J. H. Tao, Z. D. Du, Q. Guo, H. Y. Lan, L. Zhang, S. Y. Zhou, L. J. Xu, C. Liu, H. F. Liu, S. Tang, et al., BENCHIP: Benchmarking intelligence processors, J. Comput. Sci. Technol., vol. 33, no. 1, pp. 1-23, 2018.

Crossref Google Scholar

[14]

Y. X. Wang, Q. Wang, S. H. Shi, X. He, Z. H. Tang, K. Y. Zhao, and X. W. Chu, Benchmarking the performance and energy efficiency of AI accelerators for AI training, arXiv preprint arXiv: 1909.06842, 2019.

Google Scholar

[15]

HPE, Deep learning benchmarking suite, https://github.com/HewlettPackard/dlcookbook-dlbs, 2020.

[16]

R. Adolf, S. Rama, B. Reagen, G. Y. Wei, and D. Brooks, Fathom: Reference workloads for modern deep learning methods, in Proc. 2016 IEEE Int. Symp. Workload Characterization (IISWC), Providence, RI, USA, 2016, pp. 1-10.

Crossref

[17]

AIIA-DNN-benchmark, https://github.com/AIIABenchmark/AIIA-DNN-benchmark, 2021.

[18]

W. L. Gao, F. Tang, L. Wang, J. F. Zhan, C. X. Lan, C. J. Luo, Y. Y. Huang, C. Zheng, J. H. Dai, Z. Cao, et al., AIBench: An industry standard internet service AI benchmark suite, arXiv preprint arXiv: 1908.08998, 2019.

Google Scholar

[19]

W. Zhang, W. Wei, L. J. Xu, L. L. Jin, and C. Li, AI matrix: A deep learning benchmark for Alibaba data centers, arXiv preprint arXiv: 1909.10562, 2019.

Google Scholar

[20]

T. Ben-Nun, M. Besta, S. Huber, A. N. Ziogas, D. Peter, and T. Hoefler, A modular benchmarking infrastructure for high-performance and reproducible deep learning, in Proc. 2019 IEEE Int. Parallel and Distributed Processing Symp. (IPDPS), Rio de Janeiro, Brazil, 2019, pp. 66-77.

Crossref

[21]

C. Coleman, D. Narayanan, D. Kang, T. Zhao, J. Zhang, L. Nardi, P. Bailis, K. Olukotun, C. Ré, and M. Zaharia, Dawnbench: An end-to-end deep learning benchmark and competition, Training, vol. 100, no. 101, p. 102, 2017.

Google Scholar

[22]

Z. H. Jiang, L. Wang, X. W. Xiong, W. L. Gao, C. J. Luo, F. Tang, C. X. Lan, H. X. Li, and J. F. Zhan, HPC AI500: The methodology, tools, roofline performance models, and metrics for benchmarking HPC AI systems, arXiv preprint arXiv: 2007.00279, 2020.

Google Scholar

[23]

H. Y. Zhu, M. Akrout, B. J. Zheng, A. Pelegris, A. Phanishayee, B. Schroeder, and G. Pekhimenko, TBD: Benchmarking and analyzing deep neural network training, arXiv preprint arXiv: 1803.06905, 2018.

Google Scholar

[24]

F. Hutter, L. Kotthoff, and J. Vanschoren, Automated Machine Learning: Methods, Systems, Challenges. Springer, 2019.

Crossref

[25]

X. He, K. Zhao, and X. Chu, Automl: A survey of the state-of-the-art, Knowl.-Based Syst., vol. 212, p. 106622, 2021.

Crossref Google Scholar

[26]

T. Elsken, J. H. Metzen, and F. Hutter, Neural architecture search: A survey, arXiv preprint arXiv: 1808.05377, 2018.

Google Scholar

[27]

J. Bergstra, Y. Bengio, Random search for hyper-parameter optimization, J. Mach. Learn. Res., vol. 13, no. 1, pp. 281-305, 2012.

Google Scholar

[28]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in python, J. Mach. Learn. Res., vol. 12, pp. 2825-2830, 2011.

Google Scholar

[29]

Y. I. Bengio, J. Goodfellow, and A. Courville, Deep Learning. Cambridge, MA, USA: MIT Press, 2016.

[30]

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. F. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al., Tensorflow: Large-scale machine learning on heterogeneous distributed systems, arXiv preprint arXiv: 1603.04467, 2016.

Google Scholar

[31]

F. Chollet, Keras: The python deep learning library, Astrophysics Source Code Library, .

Crossref Google Scholar

[32]

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. M. Lin, A. Desmaison, L. Antiga, and A. Lerer, Automatic differentiation in pytorch, in Proc. 31st Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 1-4.

[33]

G. Nguyen, S. Dlugolinsky, M. Bobák, V. Tran, Á. L. García, I. Heredia, P. Malík, and L. Hluchý, Machine learning and deep learning frameworks and libraries for large-scale data mining: A survey, Artif. Intell. Rev., vol. 52, no. 1, pp. 77-124, 2019.

Crossref Google Scholar

[34]

Z. B. Wang, K. Liu, J. Li, Y. Zhu, and Y. N. Zhang, Various frameworks and libraries of machine learning and deep learning: A survey, Arch. Comput. Methods Eng., .

Crossref Google Scholar

[35]

A. Truong, A. Walters, J. Goodsitt, K. Hines, C. B. Bruss, and R. Farivar, Towards automated machine learning: Evaluation and comparison of AutoML approaches and tools, arXiv preprint arXiv: 1908.05557, 2019.

Google Scholar

[36]

M. A. Zöller and M. F. Huber, Survey on automated machine learning, arXiv preprint arXiv: 1904.12054, 2019.

Google Scholar

[37]

K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770-778.

Crossref

[38]

A. G. Howard, M. L. Zhu, B. Chen, D. Kalenichenko, W. J. Wang, T. Weyand, M. Andreetto, and H. Adam, Mobilenets: Efficient convolutional neural networks for mobile vision applications, arXiv preprint arXiv: 1704.04861, 2017.

Google Scholar

[39]

J. Hu, L. Shen, and G. Sun, Squeeze-and-excitation networks, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 7132-7141, 2018.

Crossref

[40]

G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, Densely connected convolutional networks, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 4700-4708.

Crossref

[41]

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv: 1409.1556, 2014.

Google Scholar

[42]

C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, in Proc. 2015 IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 1-9.

Crossref

[43]

B. Zoph and Q. V. Le, Neural architecture search with reinforcement learning, arXiv preprint arXiv: 1611.01578, 2016.

Google Scholar

[44]

E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le, and A. Kurakin, Large-scale evolution of image classifiers, in Proc.34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 2902-2911.

[45]

H. Mendoza, A. Klein, M. Feurer, J. T. Springenberg, and F. Hutter, Towards automatically-tuned neural networks, in Proc. 2016 Workshop on Automatic Machine Learning, New York City, NY, USA, 2016, pp. 58-65.

[46]

H. X. Liu, K. Simonyan, and Y. M. Yang, DARTS: Differentiable architecture search, arXiv preprint arXiv: 1806.09055, 2018.

Google Scholar

[47]

T. Q. Chen, I. Goodfellow, and J. Shlens, Net2Net: Accelerating learning via knowledge transfer, arXiv preprint arXiv: 1511.05641, 2015.

Google Scholar

[48]

T. Wei, C. H. Wang, Y. Rui, and C. W. Chen, Network morphism, in Proc. 33rd Int. Conf. Machine Learning, New York City, NY, USA, 2016, pp. 564-572.

[49]

H. F. Jin, Q. Q. Song, and X. Hu, Auto-Keras: An efficient neural architecture search system, in Proc. 25th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, Anchorage, AK, USA, 2019, pp. 1946-1956.

Crossref

[50]

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 448-456.

[51]

H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio, An empirical evaluation of deep architectures on problems with many factors of variation, in Proc. 24th Int. Conf. Machine Learning, Corvalis, OR, USA, 2007, pp. 473-480.

Crossref

[52]

J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, Algorithms for hyper-parameter optimization, in Proc. 25th Annu. Conf. Neural Information Processing Systems, Granada, Spain, 2011, pp. 2546-2554.

[53]

N. Qian, On the momentum term in gradient descent learning algorithms, Neural Netw., vol. 12, no. 1, pp. 145-151, 1999.

Google Scholar

[54]

D. Peteiro-Barral, B. Guijarro-Berdiñas, A survey of methods for distributed machine learning, Prog. Artif. Intell., vol. 2, no. 1, pp. 1-11, 2013.

Google Scholar

[55]

R. Hecht-Nielsen, Theory of the backpropagation neural network, in Neural Networks for Perception: Computation, Learning, and Architectures, H. Wechsler, ed. Amsterdam, the Netherland: Elsevier, 1992, pp. 65-93.

[56]

A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, Automatic differentiation in machine learning: A survey, J. Mach. Learn. Res., vol. 18, no. 1, pp. 5595-5637, 2017.

Google Scholar

[57]

J. E. Huss and J. A. Pennline, A Comparison of Five Benchmarks. Cleveland, OH, USA: National Aeronautics and Space Administration, 1987.

[58]

J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and F. F. Li, ImageNet: A large-scale hierarchical image database, in Proc. 2009 IEEE Conf. Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, pp. 248-255.

[59]

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, et al., The NAS parallel benchmarks, Int.J. High Perform. Comput. Appl., vol. 5, no. 3, pp. 63-73, 1991.

Google Scholar

[60]

K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification, in Proc. 2015 IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 1026-1034.

Big Data Mining and Analytics

Volume 4 Issue 3,
September 2021

Pages 208-220

DOI: 10.26599/BDMA.2021.9020004

Cite this article:

Ren Z, Liu Y, Shi T, et al. AIPerf: Automated Machine Learning as an AI-HPC Benchmark. Big Data Mining and Analytics, 2021, 4(3): 208-220. https://doi.org/10.26599/BDMA.2021.9020004

1156

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 01 March 2021

Accepted: 12 March 2021

Published: 12 May 2021

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).