Efficient Training and Implementation of Gaussian Process Potentials

Molecular simulations are a powerful tool for translating information about the intermolecular interactions within a system to thermophysical properties via statistical mechanics. However, the accuracy of any simulation is limited by the potentials that model the microscopic interactions. Most first...

Full description

Bibliographic Details
Main Author:	Broad, Jack W.
Format:	Thesis (University of Nottingham only)
Language:	English
Published:	2022
Subjects:	Machine Learning Applied Mathematics Machine-learned potentials Gaussian processes
Online Access:	https://eprints.nottingham.ac.uk/69868/

_version_	1848800590667186176
author	Broad, Jack W.
author_facet	Broad, Jack W.
author_sort	Broad, Jack W.
building	Nottingham Research Data Repository
collection	Online Access
description	Molecular simulations are a powerful tool for translating information about the intermolecular interactions within a system to thermophysical properties via statistical mechanics. However, the accuracy of any simulation is limited by the potentials that model the microscopic interactions. Most first principles methods are too computationally expensive for use at every time-step or cycle of a simulation, which require typically thousands of energy evaluations. Meanwhile, cheaper semi-empirical potentials give rise to only qualitatively accurate simulations. Consequently, methods for efficient first principles predictions in simulations are of interest. Machine-learned potentials (MLPs) have shown promise in this area, offering first principles predictions at a fraction of the cost of ab initio calculation. Of particular interest are Gaussian process (GP) potentials, which achieve equivalent accuracy to other MLPs with smaller training sets. They therefore offer the best route to employing information from expensive ab initio calculations, for which building a large data set is time-consuming. GP potentials, however, are among the most computationally intensive MLPs. Thus, they are far more costly to employ in simulations than semi-empirical potentials. This work addresses the computational expense of GP potentials by both reducing the training set size at a given accuracy and developing a method to invoke GP potentials efficiently for first principles prediction in simulations. By varying the cross-over distance between the GP and a long-range function with the accuracy of the former, training by sequential design requires up to 40 % fewer training points at fixed accuracy. This method was applied successfully to the CO-Ne, HF-Ne, HF-Na+, CO2-Ne, 2CO, 2HF and 2HCl systems, and can be extended easily to other interactions and methods of prediction. Meanwhile, a significant reduction in the time taken for Monte Carlo displacement and volume change moves is achieved by parallelisation of the requisite GP calculations. Though this exploits in part the framework of GP regression, the distribution of the calculations themselves is general to other methods of prediction. The work also shows that current kernels and input transforms for modelling intermolecular interactions are not improved easily.
first_indexed	2025-11-14T20:53:59Z
format	Thesis (University of Nottingham only)
id	nottingham-69868
institution	University of Nottingham Malaysia Campus
institution_category	Local University
language	English
last_indexed	2025-11-14T20:53:59Z
publishDate	2022
recordtype	eprints
repository_type	Digital Repository
spelling	nottingham-698682022-10-15T04:40:04Z https://eprints.nottingham.ac.uk/69868/ Efficient Training and Implementation of Gaussian Process Potentials Broad, Jack W. Molecular simulations are a powerful tool for translating information about the intermolecular interactions within a system to thermophysical properties via statistical mechanics. However, the accuracy of any simulation is limited by the potentials that model the microscopic interactions. Most first principles methods are too computationally expensive for use at every time-step or cycle of a simulation, which require typically thousands of energy evaluations. Meanwhile, cheaper semi-empirical potentials give rise to only qualitatively accurate simulations. Consequently, methods for efficient first principles predictions in simulations are of interest. Machine-learned potentials (MLPs) have shown promise in this area, offering first principles predictions at a fraction of the cost of ab initio calculation. Of particular interest are Gaussian process (GP) potentials, which achieve equivalent accuracy to other MLPs with smaller training sets. They therefore offer the best route to employing information from expensive ab initio calculations, for which building a large data set is time-consuming. GP potentials, however, are among the most computationally intensive MLPs. Thus, they are far more costly to employ in simulations than semi-empirical potentials. This work addresses the computational expense of GP potentials by both reducing the training set size at a given accuracy and developing a method to invoke GP potentials efficiently for first principles prediction in simulations. By varying the cross-over distance between the GP and a long-range function with the accuracy of the former, training by sequential design requires up to 40 % fewer training points at fixed accuracy. This method was applied successfully to the CO-Ne, HF-Ne, HF-Na+, CO2-Ne, 2CO, 2HF and 2HCl systems, and can be extended easily to other interactions and methods of prediction. Meanwhile, a significant reduction in the time taken for Monte Carlo displacement and volume change moves is achieved by parallelisation of the requisite GP calculations. Though this exploits in part the framework of GP regression, the distribution of the calculations themselves is general to other methods of prediction. The work also shows that current kernels and input transforms for modelling intermolecular interactions are not improved easily. 2022-10-15 Thesis (University of Nottingham only) NonPeerReviewed application/pdf en cc_by https://eprints.nottingham.ac.uk/69868/1/thesis.pdf Broad, Jack W. (2022) Efficient Training and Implementation of Gaussian Process Potentials. PhD thesis, University of Nottingham. Machine Learning Applied Mathematics Machine-learned potentials Gaussian processes
spellingShingle	Machine Learning Applied Mathematics Machine-learned potentials Gaussian processes Broad, Jack W. Efficient Training and Implementation of Gaussian Process Potentials
title	Efficient Training and Implementation of Gaussian Process Potentials
title_full	Efficient Training and Implementation of Gaussian Process Potentials
title_fullStr	Efficient Training and Implementation of Gaussian Process Potentials
title_full_unstemmed	Efficient Training and Implementation of Gaussian Process Potentials
title_short	Efficient Training and Implementation of Gaussian Process Potentials
title_sort	efficient training and implementation of gaussian process potentials
topic	Machine Learning Applied Mathematics Machine-learned potentials Gaussian processes
url	https://eprints.nottingham.ac.uk/69868/

Efficient Training and Implementation of Gaussian Process Potentials

Similar Items