Optimising neural network training efficiency through spectral parameter-based multiple adaptive learning rates

The process of training deep neural networks involves heavily solving optimization problems. Finding optimal values for different hyperparameters makes training neural networks challenging. A hyperparameter called learning rate or step size is one of the most crucial factors in optimization using gr...

Full description

Bibliographic Details
Main Author: Koay, Yeong Lin
Format: Final Year Project / Dissertation / Thesis
Published: 2023
Subjects:
Online Access:http://eprints.utar.edu.my/6338/
http://eprints.utar.edu.my/6338/1/4._Revised_Dissertation_Koay_Yeong_Lin.pdf
_version_ 1848886650766098432
author Koay, Yeong Lin
author_facet Koay, Yeong Lin
author_sort Koay, Yeong Lin
building UTAR Institutional Repository
collection Online Access
description The process of training deep neural networks involves heavily solving optimization problems. Finding optimal values for different hyperparameters makes training neural networks challenging. A hyperparameter called learning rate or step size is one of the most crucial factors in optimization using gradient-based approaches. A small learning rate might result in slow convergence and the loss function will get stuck in the local minimum, whereas a large learning rate might hinder convergence or cause divergence. Currently, most of the common optimization algorithms use a fixed learning rate or a simplified adaptive updating scheme in every iteration. In this project, we propose a stochastic gradient descent method with multiple adaptive learning rates (MAdaGrad) and A am with multiple adaptive learning rates (MAdaGrad Adam). In the derivation of the updating formula, we aim to minimize the log-determinant norm and allow them to satisfy the secant equation. We apply the Lagrange multiplier to the minimization problem and the Lagrange multiplier can be approximated by using the Newton-Raphson method. The proposed algorithms update the learning rate in every iteration based on the approximated spectrum of the Hessian of the loss function. The methods were compared to the existing optimization methods in deep learning, stochastic gradient descent method (SGD) and Adam. Some datasets were used to observe the performance of the proposed methods. The numerical results show that the proposed methods perform better than SGD and Adam. Hence, the proposed MAdaGrad and MAdaGrad Adam can be alternative optimizer in machine learning.
first_indexed 2025-11-15T19:41:52Z
format Final Year Project / Dissertation / Thesis
id utar-6338
institution Universiti Tunku Abdul Rahman
institution_category Local University
last_indexed 2025-11-15T19:41:52Z
publishDate 2023
recordtype eprints
repository_type Digital Repository
spelling utar-63382024-04-14T10:51:15Z Optimising neural network training efficiency through spectral parameter-based multiple adaptive learning rates Koay, Yeong Lin Q Science (General) QA Mathematics The process of training deep neural networks involves heavily solving optimization problems. Finding optimal values for different hyperparameters makes training neural networks challenging. A hyperparameter called learning rate or step size is one of the most crucial factors in optimization using gradient-based approaches. A small learning rate might result in slow convergence and the loss function will get stuck in the local minimum, whereas a large learning rate might hinder convergence or cause divergence. Currently, most of the common optimization algorithms use a fixed learning rate or a simplified adaptive updating scheme in every iteration. In this project, we propose a stochastic gradient descent method with multiple adaptive learning rates (MAdaGrad) and A am with multiple adaptive learning rates (MAdaGrad Adam). In the derivation of the updating formula, we aim to minimize the log-determinant norm and allow them to satisfy the secant equation. We apply the Lagrange multiplier to the minimization problem and the Lagrange multiplier can be approximated by using the Newton-Raphson method. The proposed algorithms update the learning rate in every iteration based on the approximated spectrum of the Hessian of the loss function. The methods were compared to the existing optimization methods in deep learning, stochastic gradient descent method (SGD) and Adam. Some datasets were used to observe the performance of the proposed methods. The numerical results show that the proposed methods perform better than SGD and Adam. Hence, the proposed MAdaGrad and MAdaGrad Adam can be alternative optimizer in machine learning. 2023 Final Year Project / Dissertation / Thesis NonPeerReviewed application/pdf http://eprints.utar.edu.my/6338/1/4._Revised_Dissertation_Koay_Yeong_Lin.pdf Koay, Yeong Lin (2023) Optimising neural network training efficiency through spectral parameter-based multiple adaptive learning rates. Master dissertation/thesis, UTAR. http://eprints.utar.edu.my/6338/
spellingShingle Q Science (General)
QA Mathematics
Koay, Yeong Lin
Optimising neural network training efficiency through spectral parameter-based multiple adaptive learning rates
title Optimising neural network training efficiency through spectral parameter-based multiple adaptive learning rates
title_full Optimising neural network training efficiency through spectral parameter-based multiple adaptive learning rates
title_fullStr Optimising neural network training efficiency through spectral parameter-based multiple adaptive learning rates
title_full_unstemmed Optimising neural network training efficiency through spectral parameter-based multiple adaptive learning rates
title_short Optimising neural network training efficiency through spectral parameter-based multiple adaptive learning rates
title_sort optimising neural network training efficiency through spectral parameter-based multiple adaptive learning rates
topic Q Science (General)
QA Mathematics
url http://eprints.utar.edu.my/6338/
http://eprints.utar.edu.my/6338/1/4._Revised_Dissertation_Koay_Yeong_Lin.pdf