MATHEMATICAL ANALYSIS OF CONVERGENCE FOR OPTIMIZATION ALGORITHMS IN NEURAL NETWORK TRAINING.

Khamdamova Dilnoza Rahmatilla kizi

Vol. 13 No. 5 (2026)
International Multidisciplinary Journal for Research & Development

Articles

MATHEMATICAL ANALYSIS OF CONVERGENCE FOR OPTIMIZATION ALGORITHMS IN NEURAL NETWORK TRAINING.

Published 2026-05-05

Khamdamova Dilnoza Rahmatilla kizi

Khamdamova Dilnoza Rahmatilla kizi
“Assistant of the Department of “Technological Machines and Labor Protection” Andijan State Technical Institute

How to Cite

Khamdamova Dilnoza Rahmatilla kizi. (2026). MATHEMATICAL ANALYSIS OF CONVERGENCE FOR OPTIMIZATION ALGORITHMS IN NEURAL NETWORK TRAINING. International Multidisciplinary Journal for Research & Development, 13(5), 137–142. Retrieved from https://www.ijmrd.in/index.php/imjrd/article/view/6167

Download Citation

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Abstract

This paper presents a rigorous mathematical analysis of the convergence properties of key optimization algorithms used in neural network training. The study investigates the dynamics of Gradient Descent (GD), Stochastic Gradient Descent (SGD), and Adam within non-convex loss landscapes. The analysis reveals that stochastic methods possess a distinct advantage in escaping saddle points via gradient noise, while adaptive methods significantly accelerate the convergence rate through coordinate-wise normalization. The results provide a theoretical foundation for the trade-off between optimization speed and the generalization capability of deep learning models.

Keywords

Neural networks, optimization, convergence analysis, gradient descent, stochastic optimization, saddle points, L-smoothness, Adam algorithm.

PDF

References

Bottou, L., Curtis, F. E., & Nocedal, J. (2018). Optimization methods for large-scale machine learning. SIAM Review, 60(2), 223–311. https://doi.org/10.1137/16M1080173
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. http://www.deeplearningbook.org
Hardt, M., Recht, B., & Singer, Y. (2016). Train faster, generalize better: Stability of stochastic gradient descent. Proceedings of the 33rd International Conference on Machine Learning (ICML), PMLR 48:1225-1234.
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1412.6980
Luo, L., Xiong, Y., Liu, Y., & Sun, X. (2019). Adaptive gradient methods with dynamic bound of learning rate. International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=Bkgj_S0zS7
Nesterov, Y. (2018). Lectures on Convex Optimization (2nd ed.). Springer Optimization and Its Applications. https://doi.org/10.1007/978-3-319-91578-4
Polyak, B. T. (1963). Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 3(4), 1295–1313.
Reddi, S. J., Kale, S., & Kumar, S. (2018). On the convergence of Adam and beyond. International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=ryQu7f-RZ
Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22(3), 400–407. [подозрительная ссылка удалена]
Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747.

Downloads

Download data is not yet available.