SIGNGD with Error Feedback Meets Lazily Aggregated Technique: Communication-Efficient Algorithms for Distributed Learning

Xiaoge Deng; Tao Sun; Feng Liu; Dongsheng Li

doi:10.26599/TST.2021.9010045

Tsinghua Science and Technology 2022, 27(1): 174-185 https://doi.org/10.26599/TST.2021.9010045

Open Access | Issue | Published: 17 August 2021

SIGNGD with Error Feedback Meets Lazily Aggregated Technique: Communication-Efficient Algorithms for Distributed Learning

Show Author's Information Hide Author's Information Xiaoge Deng, Tao Sun, Feng Liu, Dongsheng Li(

)

National Laboratory for Parallel and Distributed Processing (PDL), College of Computer, National University of Defense Technology, Changsha 410073, China

Keywords:

distributed learning, communication-efficient algorithm, convergence analysis

Cite this article:

Deng X, Sun T, Liu F, et al. SIGNGD with Error Feedback Meets Lazily Aggregated Technique: Communication-Efficient Algorithms for Distributed Learning. Tsinghua Science and Technology, 2022, 27(1): 174-185. https://doi.org/10.26599/TST.2021.9010045

Download citation

EndNote(RIS)

BibTeX

625

Views

Downloads

Citations

Crossref

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

The proliferation of massive datasets has led to significant interests in distributed algorithms for solving large-scale machine learning problems. However, the communication overhead is a major bottleneck that hampers the scalability of distributed machine learning systems. In this paper, we design two communication-efficient algorithms for distributed learning tasks. The first one is named _EF-SIGNGD, in which we use the 1-bit (sign-based) gradient quantization method to save the communication bits. Moreover, the error feedback technique, i.e., incorporating the error made by the compression operator into the next step, is employed for the convergence guarantee. The second algorithm is called _LE-SIGNGD, in which we introduce a well-designed lazy gradient aggregation rule to _EF-SIGNGD that can detect the gradients with small changes and reuse the outdated information. _LE-SIGNGD saves communication costs both in transmitted bits and communication rounds. Furthermore, we show that _LE-SIGNGD is convergent under some mild assumptions. The effectiveness of the two proposed algorithms is demonstrated through experiments on both real and synthetic data.

Full text

Abstract

Full text

Outline

About this article

SIGNGD with Error Feedback Meets Lazily Aggregated Technique: Communication-Efficient Algorithms for Distributed Learning

Show Author's information Hide Author's Information Xiaoge Deng, Tao Sun, Feng Liu, Dongsheng Li(

)

National Laboratory for Parallel and Distributed Processing (PDL), College of Computer, National University of Defense Technology, Changsha 410073, China

Abstract

Keywords: distributed learning, communication-efficient algorithm, convergence analysis

References(32)

[1]

A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola, Distributed large-scale natural graph factorization, in Proc. 22nd Int. Conf. World Wide Web, Rio de Janeiro, Brazil, 2013, pp. 37-48.

[2]

J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. Senior, P. Tucker, et al., Large scale distributed deep networks, in Proc. 25th Int. Conf. Neural Information Processing Systems, Red Hook, NY, USA, 2012, pp. 1223-1231.

[3]

M. Li, D. G. Andersen, A. Smola, and K. Yu, Communication efficient distributed machine learning with the parameter server, in Proc. 27th Int. Conf. Neural Information Processing Systems, Cambridge, MA, USA, 2014, pp. 19-27.

[4]

D. S. Li, Z. Q. Lai, K. S. Ge, Y. M. Zhang, Z. N. Zhang, Q. L. Wang, and H. M. Wang, Hpdl: Towards a general framework for high-performance distributed deep learning, presented at 2019 IEEE 39th Int. Conf. Distributed Computing Systems (ICDCS), Dallas, TX, USA, 2019, pp. 1742-1753.

[5]

K. M. Nan, S. C. Liu, J. Z. Du, and H. Liu, Deep model compression for mobile platforms: A survey, Tsinghua Science and Technology, vol. 24, no. 6, pp. 677-693, 2019.

Google Scholar

[6]

J. Q. Huang, W. T. Han, X. Y. Wang, and W. G. Chen, Heterogeneous parallel algorithm design and performance optimization for WENO on the Sunway Taihulight supercomputer, Tsinghua Science and Technology, vol. 25, no. 1, pp. 56-67, 2020.

Google Scholar

[7]

L. Guan, T. Sun, L. B. Qiao, Z. H. Yang, D. S. Li, K. S. Ge, and X. C. Lu, An efficient parallel and distributed solution to nonconvex penalized linear SVMs, Frontiers of Information Technology & Electronic Engineering, vol. 21, no. 4, pp. 587-603, 2020.

Google Scholar

[8]

A. Nedic and A. Ozdaglar, Distributed subgradient methods for multi-agent optimization, IEEE Transactions on Automatic Control, vol. 54, no. 1, pp. 48-61, 2009.

Google Scholar

[9]

G. B. Giannakis, Q. Ling, G. Mateos, I. D. Schizas, and H. Zhu, Decentralized learning for wireless communications and networking, in Splitting Methods in Communication, Imaging, Science, and Engineering, R. Glowinski, S. Osher, and W. Yin, eds. Cham, Germany: Springer, 2016, pp. 461-497.

[10]

M. I. Jordan, J. D. Lee, and Y. Yang, Communication-efficient distributed statistical inference, Journal of the American Statistical Association, vol. 114, no. 526, pp. 668-681, 2019.

Google Scholar

[11]

A. Nedić, A. Olshevsky, and M. G. Rabbat, Network topology and communication-computation tradeoffs in decentralized optimization, Proceedings of the IEEE, vol. 106, no. 5, pp. 953-976, 2018.

Google Scholar

[12]

S. Zheng, Z. Y. Huang, and J. T. Kwok, Communication-efficient distributed blockwise momentum SGD with error-feedback, arXiv preprint arXiv: 1905.10936, 2019.

Google Scholar

[13]

Z. X. Guo and S. H. Zhang, Sparse deep nonnegative matrix factorization, Big Data Mining and Analytics, vol. 3, no. 1, pp. 13-28, 2020.

Google Scholar

[14]

F. Ablayev, M. Ablayev, J. Z. Huang, K. Khadiev, N. Salikhova, and D. M. Wu, On quantum methods for machine learning problems part II: Quantum classification algorithms, Big Data Mining and Analytics, vol. 3, no. 1, pp. 56-67, 2020.

Google Scholar

[15]

D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic, QSGD: Communication-efficient SGD via gradient quantization and encoding, presented at Advances in Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 1709-1720.

[16]

J. Sun, T. Y. Chen, G. B. Giannakis, and Z. Y. Yang, Communication-efficient distributed learning via lazily aggregated quantized gradients, presented at Advances in Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 3365-3375.

[17]

F. Seide, H. Fu, J. Droppo, G. Li, and D. Yu, 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs, in Proc. 15th Annu. Conf. Int. Speech Communication Association, Singapore, 2014, pp. 1058-1062.

[18]

J. Bernstein, Y. X. Wang, K. Azizzadenesheli, and A. Anandkumar, signSGD: Compressed optimisation for non-convex problems, in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 560-569.

[19]

J. Bernstein, J. W. Zhao, K. Azizzadenesheli, and A. Anandkumar, signSGD with majority vote is communication efficient and fault tolerant, in Proc. 7th Int. Conf. Learning Representations, New Orleans, LA, USA, 2019.

[20]

S. P. Karimireddy, Q. Rebjock, S. U. Stich, and M. Jaggi, Error feedback fixes signSGD and other gradient compression schemes, in Proc. 36th Int. Conf. Machine Learning, Long Beach, CA, USA, 2019, pp. 3252-3261.

[21]

O. Shamir, N. Srebro, and T. Zhang, Communication-efficient distributed optimization using an approximate newton-type method, in Proc. 31st Int. Conf. Machine Learning, Beijing, China, 2014, pp. 1000-1008.

[22]

Y. C. Zhang and X. Lin, Disco: Distributed optimization for self-concordant empirical loss, in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 362-370.

[23]

A. Mokhtari, Q. Ling, and A. Ribeiro, Network newton distributed optimization methods, IEEE Transactions on Signal Processing, vol. 65, no. 1, pp. 146-161, 2017.

Google Scholar

[24]

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, Communication-efficient learning of deep networks from decentralized data, in Proc. 20th Int. Conf. Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 2017, pp. 1273-1282.

[25]

S. X. Zhang, A. E. Choromanska, and Y. LeCun, Deep learning with elastic averaging SGD, in Proc. 28th Int. Conf. Neural Information Processing Systems, Cambridge, MA, USA, 2015, pp. 685-693.

[26]

T. Y. Chen, G. Giannakis, T. Sun, and W. T. Yin, Lag: Lazily aggregated gradient for communication-efficient distributed learning, in Proc. 32nd Int. Conf. Neural Information Processing Systems, Red Hook, NY, USA, 2018, pp. 5050-5065.

[27]

J. Y. Wang and G. Joshi, Cooperative SGD: A unified framework for the design and analysis of communication-efficient SGD algorithms, arXiv preprint arXiv: 1808.07576, 2018.

Google Scholar

[28]

T. Y. Chen, Y. J. Sun, and W. T. Yin, LASG: Lazily aggregated stochastic gradients for communication-efficient distributed learning, arXiv preprint arXiv: 2002.11360, 2020.

Google Scholar

[29]

Y. Dong, J. Chen, Y. H. Tang, J. J. Wu, H. Q. Wang, and E. Q. Zhou, Lazy scheduling based disk energy optimization method, Tsinghua Science and Technology, vol. 25, no. 2, pp. 203-216, 2020.

Google Scholar

[30]

Y. Arjevani and O. Shamir, Communication complexity of distributed convex learning and optimization, in Proc. 28th Int. Conf. Neural Information Processing Systems, Cambridge, MA, USA, 2015, pp. 1756-1764.

[31]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.

Google Scholar

[32]

D. Davis and W. T. Yin, Convergence rate analysis of several splitting schemes, in Splitting Methods in Communication, Imaging, Science, and Engineering, R. Glowinski, S. Osher, and W. Yin, eds. Cham, Germany: Springer, 2016, pp. 115-163.

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 23 August 2020

Accepted: 24 September 2020

Published: 17 August 2021

Issue date: February 2022

Copyright

Acknowledgements

This work was supported in part by the Core Electronic Devices, High-End Generic Chips, and Basic Software Major Special Projects (No. 2018ZX01028101), and the National Natural Science Foundation of China (Nos. 61907034, 61932001, and 61906200).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).