Artificial Intelligence for Chemical Synthesis: Improving the Workflow of Medicinal Chemists using Computer-Aided Synthesis Planning
Machine learning techniques have numerous applications in modern drug discovery. Advances in computing power, machine learning algorithms and data availability have inspired renewed interest in artificial intelligence and automation in chemical synthesis. The field of Computer-Aided Synthesis Planni...
| Main Author: | |
|---|---|
| Format: | Thesis (University of Nottingham only) |
| Language: | English |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://eprints.nottingham.ac.uk/77169/ |
| _version_ | 1848800970019962880 |
|---|---|
| author | Haywood, Alexe L. |
| author_facet | Haywood, Alexe L. |
| author_sort | Haywood, Alexe L. |
| building | Nottingham Research Data Repository |
| collection | Online Access |
| description | Machine learning techniques have numerous applications in modern drug discovery. Advances in computing power, machine learning algorithms and data availability have inspired renewed interest in artificial intelligence and automation in chemical synthesis. The field of Computer-Aided Synthesis Planning (CASP) aims to improve chemists’ workflow by shortening the time required to synthesise compounds, giving them more time to analyse and design future experiments. In this thesis, we review contemporary CASP methodologies before developing machine learning models to predict reaction yield. State-of-the-art approaches to forward reaction prediction and retrosynthetic analysis tasks are outlined and compared using quantitative metrics.
Predicting reaction yield is a newer aspect of CASP that has received significantly less attention than forward reaction prediction and retrosynthetic planning. This is owing, in part, to a lack of curated reaction data reporting reaction yield. Using a combinatorial benchmark dataset generated using high throughput experimentation, we evaluate machine learning models to predict reaction yield. Our research focuses on linear, tree-based, and Support Vector Regression (SVR) machine-learning algorithms. Chemical reactivity regression tasks frequently use molecular descriptors based on time-consuming, computationally demanding quantum chemical calculations. Along with quantum chemical descriptors, we investigate a range of topological representations that are quicker to calculate and applicable to all molecules. SVR emerges as the most promising machine learning model across all molecular descriptors in a preliminary crossvalidation test evaluating interpolation.
Rigorous out-of-sample tests are designed to reliably assess the extrapolation capabilities of the most promising SVR models. The performance of SVR models built on topological representations surpasses those constructed on quantum chemical descriptors. The top SVR models built on each descriptor are subjected to additional validation. A collection of previously unseen perspective chemical reactions is compiled. Predictions are presented for synthetic assessment to validate and explore the extent of the generalisability of the top SVR models. |
| first_indexed | 2025-11-14T21:00:01Z |
| format | Thesis (University of Nottingham only) |
| id | nottingham-77169 |
| institution | University of Nottingham Malaysia Campus |
| institution_category | Local University |
| language | English |
| last_indexed | 2025-11-14T21:00:01Z |
| publishDate | 2024 |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | nottingham-771692024-07-24T04:40:42Z https://eprints.nottingham.ac.uk/77169/ Artificial Intelligence for Chemical Synthesis: Improving the Workflow of Medicinal Chemists using Computer-Aided Synthesis Planning Haywood, Alexe L. Machine learning techniques have numerous applications in modern drug discovery. Advances in computing power, machine learning algorithms and data availability have inspired renewed interest in artificial intelligence and automation in chemical synthesis. The field of Computer-Aided Synthesis Planning (CASP) aims to improve chemists’ workflow by shortening the time required to synthesise compounds, giving them more time to analyse and design future experiments. In this thesis, we review contemporary CASP methodologies before developing machine learning models to predict reaction yield. State-of-the-art approaches to forward reaction prediction and retrosynthetic analysis tasks are outlined and compared using quantitative metrics. Predicting reaction yield is a newer aspect of CASP that has received significantly less attention than forward reaction prediction and retrosynthetic planning. This is owing, in part, to a lack of curated reaction data reporting reaction yield. Using a combinatorial benchmark dataset generated using high throughput experimentation, we evaluate machine learning models to predict reaction yield. Our research focuses on linear, tree-based, and Support Vector Regression (SVR) machine-learning algorithms. Chemical reactivity regression tasks frequently use molecular descriptors based on time-consuming, computationally demanding quantum chemical calculations. Along with quantum chemical descriptors, we investigate a range of topological representations that are quicker to calculate and applicable to all molecules. SVR emerges as the most promising machine learning model across all molecular descriptors in a preliminary crossvalidation test evaluating interpolation. Rigorous out-of-sample tests are designed to reliably assess the extrapolation capabilities of the most promising SVR models. The performance of SVR models built on topological representations surpasses those constructed on quantum chemical descriptors. The top SVR models built on each descriptor are subjected to additional validation. A collection of previously unseen perspective chemical reactions is compiled. Predictions are presented for synthetic assessment to validate and explore the extent of the generalisability of the top SVR models. 2024-07-24 Thesis (University of Nottingham only) NonPeerReviewed application/pdf en cc_by https://eprints.nottingham.ac.uk/77169/1/HaywoodAlexe_Thesis.pdf Haywood, Alexe L. (2024) Artificial Intelligence for Chemical Synthesis: Improving the Workflow of Medicinal Chemists using Computer-Aided Synthesis Planning. PhD thesis, University of Nottingham. machine learning Computer-Aided Synthesis Planning drug discovery chemical synthesis |
| spellingShingle | machine learning Computer-Aided Synthesis Planning drug discovery chemical synthesis Haywood, Alexe L. Artificial Intelligence for Chemical Synthesis: Improving the Workflow of Medicinal Chemists using Computer-Aided Synthesis Planning |
| title | Artificial Intelligence for Chemical Synthesis: Improving the Workflow of Medicinal Chemists using Computer-Aided Synthesis Planning |
| title_full | Artificial Intelligence for Chemical Synthesis: Improving the Workflow of Medicinal Chemists using Computer-Aided Synthesis Planning |
| title_fullStr | Artificial Intelligence for Chemical Synthesis: Improving the Workflow of Medicinal Chemists using Computer-Aided Synthesis Planning |
| title_full_unstemmed | Artificial Intelligence for Chemical Synthesis: Improving the Workflow of Medicinal Chemists using Computer-Aided Synthesis Planning |
| title_short | Artificial Intelligence for Chemical Synthesis: Improving the Workflow of Medicinal Chemists using Computer-Aided Synthesis Planning |
| title_sort | artificial intelligence for chemical synthesis: improving the workflow of medicinal chemists using computer-aided synthesis planning |
| topic | machine learning Computer-Aided Synthesis Planning drug discovery chemical synthesis |
| url | https://eprints.nottingham.ac.uk/77169/ |