Optimising neural network training efficiency through spectral parameter-based multiple adaptive learning rates

The process of training deep neural networks involves heavily solving optimization problems. Finding optimal values for different hyperparameters makes training neural networks challenging. A hyperparameter called learning rate or step size is one of the most crucial factors in optimization using gr...

Full description

Bibliographic Details
Main Author:	Koay, Yeong Lin
Format:	Final Year Project / Dissertation / Thesis
Published:	2023
Subjects:	Q Science (General) QA Mathematics
Online Access:	http://eprints.utar.edu.my/6338/ http://eprints.utar.edu.my/6338/1/4._Revised_Dissertation_Koay_Yeong_Lin.pdf

_version_	1848886650766098432
author	Koay, Yeong Lin
author_facet	Koay, Yeong Lin
author_sort	Koay, Yeong Lin
building	UTAR Institutional Repository
collection	Online Access
description	The process of training deep neural networks involves heavily solving optimization problems. Finding optimal values for different hyperparameters makes training neural networks challenging. A hyperparameter called learning rate or step size is one of the most crucial factors in optimization using gradient-based approaches. A small learning rate might result in slow convergence and the loss function will get stuck in the local minimum, whereas a large learning rate might hinder convergence or cause divergence. Currently, most of the common optimization algorithms use a fixed learning rate or a simplified adaptive updating scheme in every iteration. In this project, we propose a stochastic gradient descent method with multiple adaptive learning rates (MAdaGrad) and A am with multiple adaptive learning rates (MAdaGrad Adam). In the derivation of the updating formula, we aim to minimize the log-determinant norm and allow them to satisfy the secant equation. We apply the Lagrange multiplier to the minimization problem and the Lagrange multiplier can be approximated by using the Newton-Raphson method. The proposed algorithms update the learning rate in every iteration based on the approximated spectrum of the Hessian of the loss function. The methods were compared to the existing optimization methods in deep learning, stochastic gradient descent method (SGD) and Adam. Some datasets were used to observe the performance of the proposed methods. The numerical results show that the proposed methods perform better than SGD and Adam. Hence, the proposed MAdaGrad and MAdaGrad Adam can be alternative optimizer in machine learning.
first_indexed	2025-11-15T19:41:52Z
format	Final Year Project / Dissertation / Thesis
id	utar-6338
institution	Universiti Tunku Abdul Rahman
institution_category	Local University
last_indexed	2025-11-15T19:41:52Z
publishDate	2023
recordtype	eprints
repository_type	Digital Repository
spelling	utar-63382024-04-14T10:51:15Z Optimising neural network training efficiency through spectral parameter-based multiple adaptive learning rates Koay, Yeong Lin Q Science (General) QA Mathematics The process of training deep neural networks involves heavily solving optimization problems. Finding optimal values for different hyperparameters makes training neural networks challenging. A hyperparameter called learning rate or step size is one of the most crucial factors in optimization using gradient-based approaches. A small learning rate might result in slow convergence and the loss function will get stuck in the local minimum, whereas a large learning rate might hinder convergence or cause divergence. Currently, most of the common optimization algorithms use a fixed learning rate or a simplified adaptive updating scheme in every iteration. In this project, we propose a stochastic gradient descent method with multiple adaptive learning rates (MAdaGrad) and A am with multiple adaptive learning rates (MAdaGrad Adam). In the derivation of the updating formula, we aim to minimize the log-determinant norm and allow them to satisfy the secant equation. We apply the Lagrange multiplier to the minimization problem and the Lagrange multiplier can be approximated by using the Newton-Raphson method. The proposed algorithms update the learning rate in every iteration based on the approximated spectrum of the Hessian of the loss function. The methods were compared to the existing optimization methods in deep learning, stochastic gradient descent method (SGD) and Adam. Some datasets were used to observe the performance of the proposed methods. The numerical results show that the proposed methods perform better than SGD and Adam. Hence, the proposed MAdaGrad and MAdaGrad Adam can be alternative optimizer in machine learning. 2023 Final Year Project / Dissertation / Thesis NonPeerReviewed application/pdf http://eprints.utar.edu.my/6338/1/4._Revised_Dissertation_Koay_Yeong_Lin.pdf Koay, Yeong Lin (2023) Optimising neural network training efficiency through spectral parameter-based multiple adaptive learning rates. Master dissertation/thesis, UTAR. http://eprints.utar.edu.my/6338/
spellingShingle	Q Science (General) QA Mathematics Koay, Yeong Lin Optimising neural network training efficiency through spectral parameter-based multiple adaptive learning rates
title	Optimising neural network training efficiency through spectral parameter-based multiple adaptive learning rates
title_full	Optimising neural network training efficiency through spectral parameter-based multiple adaptive learning rates
title_fullStr	Optimising neural network training efficiency through spectral parameter-based multiple adaptive learning rates
title_full_unstemmed	Optimising neural network training efficiency through spectral parameter-based multiple adaptive learning rates
title_short	Optimising neural network training efficiency through spectral parameter-based multiple adaptive learning rates
title_sort	optimising neural network training efficiency through spectral parameter-based multiple adaptive learning rates
topic	Q Science (General) QA Mathematics
url	http://eprints.utar.edu.my/6338/ http://eprints.utar.edu.my/6338/1/4._Revised_Dissertation_Koay_Yeong_Lin.pdf

Optimising neural network training efficiency through spectral parameter-based multiple adaptive learning rates

Similar Items