AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (9.1 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

A Variance Reducing Stochastic Proximal Method with Acceleration Techniques

School of Mathematical Science, Zhejiang Normal University, Jinhua 321004, China
Show Author Information

Abstract

We consider a fundamental problem in the field of machine learning—structural risk minimization, which can be represented as the average of a large number of smooth component functions plus a simple and convex (but possibly non-smooth) function. In this paper, we propose a novel proximal variance reducing stochastic method building on the introduced Point-SAGA. Our method achieves two proximal operator calculations by combining the fast Douglas–Rachford splitting and refers to the scheme of the FISTA algorithm in the choice of momentum factors. We show that the objective function value converges to the iteration point at the rate of 𝒪(1/k) when each loss function is convex and smooth. In addition, we prove that our method achieves a linear convergence rate for strongly convex and smooth loss functions. Experiments demonstrate the effectiveness of the proposed algorithm, especially when the loss function is ill-conditioned with good acceleration.

References

[1]
Z. Yuan, Y. Lu, and Y. Xue, DroidDetector: Android malware characterization and detection using deep learning, Tsinghua Science and Technology, vol. 21, no. 1, pp. 114123, 2016.
[2]
Y. Sun, Z. Dou, Y. Li, and S. Wang, Improving semantic part features for person re-identification with supervised non-local similarity, Tsinghua Science and Technology, vol. 25, no. 5, pp. 636646, 2020.
[3]
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. New York, NY, USA: Springer, 2009.
[4]
H. Robbins and S. Monro, A stochastic approximation method, Ann. Math. Statist., vol. 22, no. 3, pp. 400407, 1951.
[5]
R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Proc. 26th Int. Conf. Neural Information Processing Systems, Lake Tahoe, NV, USA, 2013, pp. 315323.
[6]
A. Dedazio, F. Bach, and S. Lacoste-Julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 16461654.
[7]
O. Shamir and T. Zhang, Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes, in Proc. 30th Int. Conf. Machine Learning, Atlanta, GA, USA, 2013, pp. 7179.
[8]
L. Xiao and T. Zhang, A proximal stochastic gradient method with progressive variance reduction, SIAM J. Optim., vol. 24, no. 4, pp. 20572075, 2014.
[9]
S. Shalev-Shwartz and T. Zhang, Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization, in Proc. 31st Int. Conf. Machine Learning, Beijing, China, 2014, pp. I-64I-72.
[10]
A. Defazio, A simple practical accelerated method for finite sums, in Proc. 30th Int. Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 676684.
[11]
H. Lin, J. Mairal, and Z. Harchaoui, Catalyst acceleration for first-order convex optimization: From theory to practice, J. Mach. Learn. Res., vol. 18, no. 1, pp. 78547907, 2017.
[12]
Z. Allen-Zhu, Katyusha: The first direct acceleration of stochastic gradient methods, J. Mach. Learn. Res., vol. 18, no. 1, pp. 81948244, 2017.
[13]
K. Zhou, F. Shang, and J. Cheng, A simple stochastic variance reduced algorithm with fast convergence rates, in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 59805989.
[14]
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course. New York, NY, USA: Springer, 2004.
[15]
A. Chambolle and C. H. Dossal, On the convergence of the iterates of “FISTA”, J. Optim. Theory Appl., vol. 166, no. 3, p. 25, 2015.
[16]
J. Liu, L. Xu, S. Shen, and Q. Ling, An accelerated variance reducing stochastic method with Douglas-Rachford splitting, Mach. Learn., vol. 108, no. 5, pp. 859878, 2019.
[17]
P. Panagiotis, L. Stella, and A. Bemporad, Douglas-Rachford splitting: Complexity estimates and accelerated variants, in Proc. 53rd IEEE Conf. Decision and Control, Los Angeles, CA, USA, 2014, pp. 42344239.
[18]
C. Lemaréchal and C. Sagastizábal, Practical aspects of the Moreau–Yosida regularization: Theoretical preliminaries, SIAM J. Optim., vol. 7, no. 2, pp. 367385, 1997.
[19]
T. Hofmann, A. Lucchi, S. Lacoste-Julien, and B. McWilliams, Variance reduced stochastic gradient descent with neighbors, in Proc. 28th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2015, pp. 23052313.
[20]
H. Luo, X. Bai, G. Lim, and J. Peng, New global algorithms for quadratic programming with a few negative eigenvalues based on alternative direction method and convex relaxation, Math. Prog. Comp., vol. 11, no. 1, pp. 119171, 2019.
[21]
H. Luo, X. Ding, J. Peng, R. Jiang, and D. Li, Complexity results and effective algorithms for worst-case linear optimization under uncertainties, Informs J. Comput., vol. 33, no. 1, pp. 180197, 2021.
Tsinghua Science and Technology
Pages 999-1008
Cite this article:
Lei J, Zhang Y, Zhang Z. A Variance Reducing Stochastic Proximal Method with Acceleration Techniques. Tsinghua Science and Technology, 2023, 28(6): 999-1008. https://doi.org/10.26599/TST.2022.9010051

671

Views

109

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 01 July 2022
Revised: 13 October 2022
Accepted: 30 October 2022
Published: 28 July 2023
© The author(s) 2023.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return