Abstract
Stochastic gradient descent (SGD) is one of the most widely used optimization methods in neural network training. However, it is vulnerable to certain limitations, such as falling into a sharp minima and overfitting. The coevolutionary neural-based optimization algorithm is recognized for its robust global search capability and effectiveness in reducing overfitting. Additionally, building on previous work exploring the relationship between loss landscape geometry and generalization, the squared gradient norm can be used as a criterion for identifying flat loss landscapes, thereby improving model generalization. In this work, we propose a coevolutionary SGD (CSGD) algorithm that integrates the coevolutionary neural-based optimization approach with the squared gradient norm as a comparison criterion. This algorithm aims to minimize both the loss values and the sharpness of the loss landscape, thereby simultaneously addressing the problems of poor generalization. We analyze the convergence of the proposed algorithm. We elaborate on the experimental results using multiple neural networks on benchmark datasets to demonstrate the advantages of the proposed method with respect to model generalization and local minima phenomenon.
京公网安备11010802044758号
Comments on this article