A Variance Reducing Stochastic Proximal Method with Acceleration Techniques

Jialin Lei; Ying Zhang; Zhao Zhang

doi:10.26599/TST.2022.9010051

| Sign up

PDF (9.1 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Figures (1)

Fig. 1

Tables (1)

Table 1

Open Access

A Variance Reducing Stochastic Proximal Method with Acceleration Techniques

Jialin Lei^¹, Ying Zhang^¹(), Zhao Zhang^¹

1School of Mathematical Science, Zhejiang Normal University, Jinhua 321004, China

Show Author Information

Abstract

We consider a fundamental problem in the field of machine learning—structural risk minimization, which can be represented as the average of a large number of smooth component functions plus a simple and convex (but possibly non-smooth) function. In this paper, we propose a novel proximal variance reducing stochastic method building on the introduced Point-SAGA. Our method achieves two proximal operator calculations by combining the fast Douglas–Rachford splitting and refers to the scheme of the FISTA algorithm in the choice of momentum factors. We show that the objective function value converges to the iteration point at the rate of $𝒪 (1 / k)$ when each loss function is convex and smooth. In addition, we prove that our method achieves a linear convergence rate for strongly convex and smooth loss functions. Experiments demonstrate the effectiveness of the proposed algorithm, especially when the loss function is ill-conditioned with good acceleration.

Keywords

composite optimization Variance Reduction (VR)fast Douglas–Rachford (DR) splitting proximal operator

References

[1]

Yuan

, Y.

, and Y.

Xue

, DroidDetector: Android malware characterization and detection using deep learning, Tsinghua Science and Technology, vol. 21, no. 1, pp. 114–123, 2016.

Crossref Google Scholar

[2]

Sun

, Z.

Dou

, Y.

, and S.

Wang

, Improving semantic part features for person re-identification with supervised non-local similarity, Tsinghua Science and Technology, vol. 25, no. 5, pp. 636–646, 2020.

Crossref Google Scholar

[3]

Hastie

, R.

Tibshirani

, and J.

Friedman

, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. New York, NY, USA: Springer, 2009.

Crossref

[4]

Robbins

and S.

Monro

, A stochastic approximation method, Ann. Math. Statist., vol. 22, no. 3, pp. 400–407, 1951.

Crossref Google Scholar

[5]

Johnson

and T.

Zhang

, Accelerating stochastic gradient descent using predictive variance reduction, in Proc. 26^th Int. Conf. Neural Information Processing Systems, Lake Tahoe, NV, USA, 2013, pp. 315–323.

Google Scholar

[6]

Dedazio

, F.

Bach

, and S.

Lacoste-Julien

, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, in Proc. 27^th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 1646–1654.

Google Scholar

[7]

Shamir

and T.

Zhang

, Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes, in Proc. 30^th Int. Conf. Machine Learning, Atlanta, GA, USA, 2013, pp. 71–79.

Google Scholar

[8]

Xiao

and T.

Zhang

, A proximal stochastic gradient method with progressive variance reduction, SIAM J. Optim., vol. 24, no. 4, pp. 2057–2075, 2014.

Crossref Google Scholar

[9]

Shalev-Shwartz

and T.

Zhang

, Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization, in Proc. 31^st Int. Conf. Machine Learning, Beijing, China, 2014, pp. I-64–I-72.

Google Scholar

[10]

Defazio

, A simple practical accelerated method for finite sums, in Proc. 30^th Int. Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 676–684.

Google Scholar

[11]

Lin

, J.

Mairal

, and Z.

Harchaoui

, Catalyst acceleration for first-order convex optimization: From theory to practice, J. Mach. Learn. Res., vol. 18, no. 1, pp. 7854–7907, 2017.

Google Scholar

[12]

Allen-Zhu

, Katyusha: The first direct acceleration of stochastic gradient methods, J. Mach. Learn. Res., vol. 18, no. 1, pp. 8194–8244, 2017.

Crossref Google Scholar

[13]

Zhou

, F.

Shang

, and J.

Cheng

, A simple stochastic variance reduced algorithm with fast convergence rates, in Proc. 35^th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 5980–5989.

Google Scholar

[14]

Nesterov

, Introductory Lectures on Convex Optimization: A Basic Course. New York, NY, USA: Springer, 2004.

Crossref

[15]

Chambolle

and C. H.

Dossal

, On the convergence of the iterates of “FISTA”, J. Optim. Theory Appl., vol. 166, no. 3, p. 25, 2015.

Crossref Google Scholar

[16]

Liu

, L.

, S.

Shen

, and Q.

Ling

, An accelerated variance reducing stochastic method with Douglas-Rachford splitting, Mach. Learn., vol. 108, no. 5, pp. 859–878, 2019.

Crossref Google Scholar

[17]

Panagiotis

, L.

Stella

, and A.

Bemporad

, Douglas-Rachford splitting: Complexity estimates and accelerated variants, in Proc. 53^rd IEEE Conf. Decision and Control, Los Angeles, CA, USA, 2014, pp. 4234–4239.

Google Scholar

[18]

Lemaréchal

and C.

Sagastizábal

, Practical aspects of the Moreau–Yosida regularization: Theoretical preliminaries, SIAM J. Optim., vol. 7, no. 2, pp. 367–385, 1997.

Crossref Google Scholar

[19]

Hofmann

, A.

Lucchi

, S.

Lacoste-Julien

, and B.

McWilliams

, Variance reduced stochastic gradient descent with neighbors, in Proc. 28^th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2015, pp. 2305–2313.

Google Scholar

[20]

Luo

, X.

Bai

, G.

Lim

, and J.

Peng

, New global algorithms for quadratic programming with a few negative eigenvalues based on alternative direction method and convex relaxation, Math. Prog. Comp., vol. 11, no. 1, pp. 119–171, 2019.

Crossref Google Scholar

[21]

Luo

, X.

Ding

, J.

Peng

, R.

Jiang

, and D.

, Complexity results and effective algorithms for worst-case linear optimization under uncertainties, Informs J. Comput., vol. 33, no. 1, pp. 180–197, 2021.

Crossref Google Scholar

Tsinghua Science and Technology

Volume 28 Issue 6,
December 2023

Pages 999-1008

DOI: 10.26599/TST.2022.9010051

Cite this article:

Lei J, Zhang Y, Zhang Z. A Variance Reducing Stochastic Proximal Method with Acceleration Techniques. Tsinghua Science and Technology, 2023, 28(6): 999-1008. https://doi.org/10.26599/TST.2022.9010051

Dataset	$β_{k}$
Dataset	0.2	0.5	0.8	$\frac{k - 2}{k + 1}$
covtype	149	115	127	106
a9a	259	173	223	165
mushrooms	77	68	75	56
w7a	168	136	151	107