Journal Home > Volume 27 , Issue 1

The proliferation of massive datasets has led to significant interests in distributed algorithms for solving large-scale machine learning problems. However, the communication overhead is a major bottleneck that hampers the scalability of distributed machine learning systems. In this paper, we design two communication-efficient algorithms for distributed learning tasks. The first one is named EF-SIGNGD, in which we use the 1-bit (sign-based) gradient quantization method to save the communication bits. Moreover, the error feedback technique, i.e., incorporating the error made by the compression operator into the next step, is employed for the convergence guarantee. The second algorithm is called LE-SIGNGD, in which we introduce a well-designed lazy gradient aggregation rule to EF-SIGNGD that can detect the gradients with small changes and reuse the outdated information. LE-SIGNGD saves communication costs both in transmitted bits and communication rounds. Furthermore, we show that LE-SIGNGD is convergent under some mild assumptions. The effectiveness of the two proposed algorithms is demonstrated through experiments on both real and synthetic data.


menu
Abstract
Full text
Outline
About this article

SIGNGD with Error Feedback Meets Lazily Aggregated Technique: Communication-Efficient Algorithms for Distributed Learning

Show Author's information Xiaoge DengTao SunFeng LiuDongsheng Li( )
National Laboratory for Parallel and Distributed Processing (PDL), College of Computer, National University of Defense Technology, Changsha 410073, China

Abstract

The proliferation of massive datasets has led to significant interests in distributed algorithms for solving large-scale machine learning problems. However, the communication overhead is a major bottleneck that hampers the scalability of distributed machine learning systems. In this paper, we design two communication-efficient algorithms for distributed learning tasks. The first one is named EF-SIGNGD, in which we use the 1-bit (sign-based) gradient quantization method to save the communication bits. Moreover, the error feedback technique, i.e., incorporating the error made by the compression operator into the next step, is employed for the convergence guarantee. The second algorithm is called LE-SIGNGD, in which we introduce a well-designed lazy gradient aggregation rule to EF-SIGNGD that can detect the gradients with small changes and reuse the outdated information. LE-SIGNGD saves communication costs both in transmitted bits and communication rounds. Furthermore, we show that LE-SIGNGD is convergent under some mild assumptions. The effectiveness of the two proposed algorithms is demonstrated through experiments on both real and synthetic data.

Keywords: distributed learning, communication-efficient algorithm, convergence analysis

References(32)

[1]
A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola, Distributed large-scale natural graph factorization, in Proc. 22nd Int. Conf. World Wide Web, Rio de Janeiro, Brazil, 2013, pp. 37-48.
[2]
J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. Senior, P. Tucker, et al., Large scale distributed deep networks, in Proc. 25th Int. Conf. Neural Information Processing Systems, Red Hook, NY, USA, 2012, pp. 1223-1231.
[3]
M. Li, D. G. Andersen, A. Smola, and K. Yu, Communication efficient distributed machine learning with the parameter server, in Proc. 27th Int. Conf. Neural Information Processing Systems, Cambridge, MA, USA, 2014, pp. 19-27.
[4]
D. S. Li, Z. Q. Lai, K. S. Ge, Y. M. Zhang, Z. N. Zhang, Q. L. Wang, and H. M. Wang, Hpdl: Towards a general framework for high-performance distributed deep learning, presented at 2019 IEEE 39th Int. Conf. Distributed Computing Systems (ICDCS), Dallas, TX, USA, 2019, pp. 1742-1753.
[5]
K. M. Nan, S. C. Liu, J. Z. Du, and H. Liu, Deep model compression for mobile platforms: A survey, Tsinghua Science and Technology, vol. 24, no. 6, pp. 677-693, 2019.
[6]
J. Q. Huang, W. T. Han, X. Y. Wang, and W. G. Chen, Heterogeneous parallel algorithm design and performance optimization for WENO on the Sunway Taihulight supercomputer, Tsinghua Science and Technology, vol. 25, no. 1, pp. 56-67, 2020.
[7]
L. Guan, T. Sun, L. B. Qiao, Z. H. Yang, D. S. Li, K. S. Ge, and X. C. Lu, An efficient parallel and distributed solution to nonconvex penalized linear SVMs, Frontiers of Information Technology & Electronic Engineering, vol. 21, no. 4, pp. 587-603, 2020.
[8]
A. Nedic and A. Ozdaglar, Distributed subgradient methods for multi-agent optimization, IEEE Transactions on Automatic Control, vol. 54, no. 1, pp. 48-61, 2009.
[9]
G. B. Giannakis, Q. Ling, G. Mateos, I. D. Schizas, and H. Zhu, Decentralized learning for wireless communications and networking, in Splitting Methods in Communication, Imaging, Science, and Engineering, R. Glowinski, S. Osher, and W. Yin, eds. Cham, Germany: Springer, 2016, pp. 461-497.
[10]
M. I. Jordan, J. D. Lee, and Y. Yang, Communication-efficient distributed statistical inference, Journal of the American Statistical Association, vol. 114, no. 526, pp. 668-681, 2019.
[11]
A. Nedić, A. Olshevsky, and M. G. Rabbat, Network topology and communication-computation tradeoffs in decentralized optimization, Proceedings of the IEEE, vol. 106, no. 5, pp. 953-976, 2018.
[12]
S. Zheng, Z. Y. Huang, and J. T. Kwok, Communication-efficient distributed blockwise momentum SGD with error-feedback, arXiv preprint arXiv: 1905.10936, 2019.
[13]
Z. X. Guo and S. H. Zhang, Sparse deep nonnegative matrix factorization, Big Data Mining and Analytics, vol. 3, no. 1, pp. 13-28, 2020.
[14]
F. Ablayev, M. Ablayev, J. Z. Huang, K. Khadiev, N. Salikhova, and D. M. Wu, On quantum methods for machine learning problems part II: Quantum classification algorithms, Big Data Mining and Analytics, vol. 3, no. 1, pp. 56-67, 2020.
[15]
D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic, QSGD: Communication-efficient SGD via gradient quantization and encoding, presented at Advances in Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 1709-1720.
[16]
J. Sun, T. Y. Chen, G. B. Giannakis, and Z. Y. Yang, Communication-efficient distributed learning via lazily aggregated quantized gradients, presented at Advances in Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 3365-3375.
[17]
F. Seide, H. Fu, J. Droppo, G. Li, and D. Yu, 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs, in Proc. 15th Annu. Conf. Int. Speech Communication Association, Singapore, 2014, pp. 1058-1062.
[18]
J. Bernstein, Y. X. Wang, K. Azizzadenesheli, and A. Anandkumar, signSGD: Compressed optimisation for non-convex problems, in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 560-569.
[19]
J. Bernstein, J. W. Zhao, K. Azizzadenesheli, and A. Anandkumar, signSGD with majority vote is communication efficient and fault tolerant, in Proc. 7th Int. Conf. Learning Representations, New Orleans, LA, USA, 2019.
[20]
S. P. Karimireddy, Q. Rebjock, S. U. Stich, and M. Jaggi, Error feedback fixes signSGD and other gradient compression schemes, in Proc. 36th Int. Conf. Machine Learning, Long Beach, CA, USA, 2019, pp. 3252-3261.
[21]
O. Shamir, N. Srebro, and T. Zhang, Communication-efficient distributed optimization using an approximate newton-type method, in Proc. 31st Int. Conf. Machine Learning, Beijing, China, 2014, pp. 1000-1008.
[22]
Y. C. Zhang and X. Lin, Disco: Distributed optimization for self-concordant empirical loss, in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 362-370.
[23]
A. Mokhtari, Q. Ling, and A. Ribeiro, Network newton distributed optimization methods, IEEE Transactions on Signal Processing, vol. 65, no. 1, pp. 146-161, 2017.
[24]
B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, Communication-efficient learning of deep networks from decentralized data, in Proc. 20th Int. Conf. Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 2017, pp. 1273-1282.
[25]
S. X. Zhang, A. E. Choromanska, and Y. LeCun, Deep learning with elastic averaging SGD, in Proc. 28th Int. Conf. Neural Information Processing Systems, Cambridge, MA, USA, 2015, pp. 685-693.
[26]
T. Y. Chen, G. Giannakis, T. Sun, and W. T. Yin, Lag: Lazily aggregated gradient for communication-efficient distributed learning, in Proc. 32nd Int. Conf. Neural Information Processing Systems, Red Hook, NY, USA, 2018, pp. 5050-5065.
[27]
J. Y. Wang and G. Joshi, Cooperative SGD: A unified framework for the design and analysis of communication-efficient SGD algorithms, arXiv preprint arXiv: 1808.07576, 2018.
[28]
T. Y. Chen, Y. J. Sun, and W. T. Yin, LASG: Lazily aggregated stochastic gradients for communication-efficient distributed learning, arXiv preprint arXiv: 2002.11360, 2020.
[29]
Y. Dong, J. Chen, Y. H. Tang, J. J. Wu, H. Q. Wang, and E. Q. Zhou, Lazy scheduling based disk energy optimization method, Tsinghua Science and Technology, vol. 25, no. 2, pp. 203-216, 2020.
[30]
Y. Arjevani and O. Shamir, Communication complexity of distributed convex learning and optimization, in Proc. 28th Int. Conf. Neural Information Processing Systems, Cambridge, MA, USA, 2015, pp. 1756-1764.
[31]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[32]
D. Davis and W. T. Yin, Convergence rate analysis of several splitting schemes, in Splitting Methods in Communication, Imaging, Science, and Engineering, R. Glowinski, S. Osher, and W. Yin, eds. Cham, Germany: Springer, 2016, pp. 115-163.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 23 August 2020
Accepted: 24 September 2020
Published: 17 August 2021
Issue date: February 2022

Copyright

© The author(s) 2022

Acknowledgements

This work was supported in part by the Core Electronic Devices, High-End Generic Chips, and Basic Software Major Special Projects (No. 2018ZX01028101), and the National Natural Science Foundation of China (Nos. 61907034, 61932001, and 61906200).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return