Data-driven total organic carbon prediction using feature selection methods incorporated in an automated machine learning framework

An accurate assessment of shale gas resources is highly important for the sustainable development of these energy resources. Total organic carbon (TOC) analysis thus becomes fundamental for understanding the distribution and quality of hydrocarbon source rocks within a shale gas reservoir. The eleva...

Full description

Bibliographic Details
Main Authors: Macêdo, Bruno da Silva, Wayo, Dennis Delali Kwesi, Campos, Deivid, De Santis, Rodrigo Barbosa, Martinho, Alfeu Dias, Yaseen, Zaher Mundher, Saporetti, Camila M., Goliatt, Leonardo
Format: Article
Language:English
Published: Nature Publishing Group 2025
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/45116/
http://umpir.ump.edu.my/id/eprint/45116/1/Data-driven%20total%20organic%20carbon%20prediction%20using%20feature%20selection.pdf
_version_ 1848827255216668672
author Macêdo, Bruno da Silva
Wayo, Dennis Delali Kwesi
Campos, Deivid
De Santis, Rodrigo Barbosa
Martinho, Alfeu Dias
Yaseen, Zaher Mundher
Saporetti, Camila M.
Goliatt, Leonardo
author_facet Macêdo, Bruno da Silva
Wayo, Dennis Delali Kwesi
Campos, Deivid
De Santis, Rodrigo Barbosa
Martinho, Alfeu Dias
Yaseen, Zaher Mundher
Saporetti, Camila M.
Goliatt, Leonardo
author_sort Macêdo, Bruno da Silva
building UMP Institutional Repository
collection Online Access
description An accurate assessment of shale gas resources is highly important for the sustainable development of these energy resources. Total organic carbon (TOC) analysis thus becomes fundamental for understanding the distribution and quality of hydrocarbon source rocks within a shale gas reservoir. The elevation of the TOC is often associated with the presence of source rocks, indicating the potential for oil and gas production. TOC assessment is performed using laboratory methods, which can be time-consuming and costly. Data-driven models have been successfully applied to model the relationship between TOC and other constituents and to predict the TOC content. However, these methods depend on extensive parameter adjustments that must be carefully conducted in different sedimentary environments. In this context, Automated Machine Learning (AutoML) is an alternative for accurately predicting TOCs, saving time-consuming fine-tuning steps in model development. This study aims to develop an AutoML strategy for estimating TOC using well log data. This procedure automatically preprocesses the search for the best method parameters, reducing the execution time. Among the methods evaluated, Extremely Randomized Trees (XT) performed best (R = 0.8632, MSE = 0.1806) in the test set. The proposed strategy provides a powerful data-driven method, which allows real-world use of the well to assist in data analysis and subsequent decision-making.
first_indexed 2025-11-15T03:57:48Z
format Article
id ump-45116
institution Universiti Malaysia Pahang
institution_category Local University
language English
last_indexed 2025-11-15T03:57:48Z
publishDate 2025
publisher Nature Publishing Group
recordtype eprints
repository_type Digital Repository
spelling ump-451162025-07-18T07:05:20Z http://umpir.ump.edu.my/id/eprint/45116/ Data-driven total organic carbon prediction using feature selection methods incorporated in an automated machine learning framework Macêdo, Bruno da Silva Wayo, Dennis Delali Kwesi Campos, Deivid De Santis, Rodrigo Barbosa Martinho, Alfeu Dias Yaseen, Zaher Mundher Saporetti, Camila M. Goliatt, Leonardo QA75 Electronic computers. Computer science TD Environmental technology. Sanitary engineering TP Chemical technology An accurate assessment of shale gas resources is highly important for the sustainable development of these energy resources. Total organic carbon (TOC) analysis thus becomes fundamental for understanding the distribution and quality of hydrocarbon source rocks within a shale gas reservoir. The elevation of the TOC is often associated with the presence of source rocks, indicating the potential for oil and gas production. TOC assessment is performed using laboratory methods, which can be time-consuming and costly. Data-driven models have been successfully applied to model the relationship between TOC and other constituents and to predict the TOC content. However, these methods depend on extensive parameter adjustments that must be carefully conducted in different sedimentary environments. In this context, Automated Machine Learning (AutoML) is an alternative for accurately predicting TOCs, saving time-consuming fine-tuning steps in model development. This study aims to develop an AutoML strategy for estimating TOC using well log data. This procedure automatically preprocesses the search for the best method parameters, reducing the execution time. Among the methods evaluated, Extremely Randomized Trees (XT) performed best (R = 0.8632, MSE = 0.1806) in the test set. The proposed strategy provides a powerful data-driven method, which allows real-world use of the well to assist in data analysis and subsequent decision-making. Nature Publishing Group 2025 Article PeerReviewed pdf en cc_by_nc_nd_4 http://umpir.ump.edu.my/id/eprint/45116/1/Data-driven%20total%20organic%20carbon%20prediction%20using%20feature%20selection.pdf Macêdo, Bruno da Silva and Wayo, Dennis Delali Kwesi and Campos, Deivid and De Santis, Rodrigo Barbosa and Martinho, Alfeu Dias and Yaseen, Zaher Mundher and Saporetti, Camila M. and Goliatt, Leonardo (2025) Data-driven total organic carbon prediction using feature selection methods incorporated in an automated machine learning framework. Scientific Reports, 15 (1). pp. 1-19. ISSN 2045-2322. (Published) https://doi.org/10.1038/s41598-025-91224-4 https://doi.org/10.1038/s41598-025-91224-4
spellingShingle QA75 Electronic computers. Computer science
TD Environmental technology. Sanitary engineering
TP Chemical technology
Macêdo, Bruno da Silva
Wayo, Dennis Delali Kwesi
Campos, Deivid
De Santis, Rodrigo Barbosa
Martinho, Alfeu Dias
Yaseen, Zaher Mundher
Saporetti, Camila M.
Goliatt, Leonardo
Data-driven total organic carbon prediction using feature selection methods incorporated in an automated machine learning framework
title Data-driven total organic carbon prediction using feature selection methods incorporated in an automated machine learning framework
title_full Data-driven total organic carbon prediction using feature selection methods incorporated in an automated machine learning framework
title_fullStr Data-driven total organic carbon prediction using feature selection methods incorporated in an automated machine learning framework
title_full_unstemmed Data-driven total organic carbon prediction using feature selection methods incorporated in an automated machine learning framework
title_short Data-driven total organic carbon prediction using feature selection methods incorporated in an automated machine learning framework
title_sort data-driven total organic carbon prediction using feature selection methods incorporated in an automated machine learning framework
topic QA75 Electronic computers. Computer science
TD Environmental technology. Sanitary engineering
TP Chemical technology
url http://umpir.ump.edu.my/id/eprint/45116/
http://umpir.ump.edu.my/id/eprint/45116/
http://umpir.ump.edu.my/id/eprint/45116/
http://umpir.ump.edu.my/id/eprint/45116/1/Data-driven%20total%20organic%20carbon%20prediction%20using%20feature%20selection.pdf