An imperative for soil spectroscopic modelling is to think global but fit local with transfer learning

Soil spectroscopy with machine learning (ML) can estimate soil properties. Extensive soil spectral libraries (SSLs) have been developed for this purpose. However, general models built with those SSLs do not generalize well on new ‘unseen’ local data. The main reason is the different characteristics...

Full description

Bibliographic Details
Main Authors: Viscarra Rossel, Raphael, Shen, Zefang, Ramirez Lopez, L., Behrens, T., Shi, Z., Wetterlind, J., Sudduth, K.A., Stenberg, B., Guerrero, C., Gholizadeh, A., Ben-Dor, E., St Luce, M., Orellano, C.
Format: Journal Article
Published: 2024
Online Access:http://purl.org/au-research/grants/arc/DP210100420
http://hdl.handle.net/20.500.11937/96051
_version_ 1848766083526295552
author Viscarra Rossel, Raphael
Shen, Zefang
Ramirez Lopez, L.
Behrens, T.
Shi, Z.
Wetterlind, J.
Sudduth, K.A.
Stenberg, B.
Guerrero, C.
Gholizadeh, A.
Ben-Dor, E.
St Luce, M.
Orellano, C.
author_facet Viscarra Rossel, Raphael
Shen, Zefang
Ramirez Lopez, L.
Behrens, T.
Shi, Z.
Wetterlind, J.
Sudduth, K.A.
Stenberg, B.
Guerrero, C.
Gholizadeh, A.
Ben-Dor, E.
St Luce, M.
Orellano, C.
author_sort Viscarra Rossel, Raphael
building Curtin Institutional Repository
collection Online Access
description Soil spectroscopy with machine learning (ML) can estimate soil properties. Extensive soil spectral libraries (SSLs) have been developed for this purpose. However, general models built with those SSLs do not generalize well on new ‘unseen’ local data. The main reason is the different characteristics of the observations in the SSL and the local data, which cause their conditional and marginal distributions to differ. This makes the modelling of soil properties with spectra challenging. General models developed using large ‘global’ SSLs offer broad, systematic information on the soil-spectra relationships. However, to accurately generalize in a local situation, they must be adjusted to capture the site-specific characteristics of the local observations. Most current methods for ‘localizing’ spectroscopic modelling report inconsistent results. An understanding of spectroscopic ‘localization’ is lacking, and there is no framework to guide further developments. Here, we review current localization methods and propose their reformulation as a transfer learning (TL) undertaking. We then demonstrate the implementation of instance-based TL with RS-LOCAL 2.0 for modelling the soil organic carbon (SOC) content of 12 sites representing fields, farms and regions from 10 countries on the seven continents. The method uses a small number of instances or observations (measured soil property values and corresponding spectra) from the local site to transfer relevant information from a large and diverse global SSL (GSSL 2.0) with more than 50,000 records. We found that with ≤30 local observations, RS-LOCAL 2.0 produces more accurate and stable estimates of SOC than modelling with only the local data. Using the information in the GSSL 2.0 and reducing the number of samples for laboratory analysis, the method improves the cost-efficiency and practicality of soil spectroscopy. We interpreted the transfer by analysing the data, models, and soil and environmental relationships of the local and the ‘transferred’ data to gain insight into the approach. Transferring instances from the GSSL 2.0 to the local sites helped to align their conditional and marginal distributions, making the spectra-SOC relationships in the models more robust. Finally, we propose directions for future research. The guiding principle for developing practical and cost-effective spectroscopy should be to think globally but fit locally. By reformulating the localization problem within a TL framework, we hope to have acquainted the soil science community with a set of methodologies that can inspire the development of new, innovative algorithms for soil spectroscopic modelling.
first_indexed 2025-11-14T11:45:30Z
format Journal Article
id curtin-20.500.11937-96051
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T11:45:30Z
publishDate 2024
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-960512024-11-07T00:49:51Z An imperative for soil spectroscopic modelling is to think global but fit local with transfer learning Viscarra Rossel, Raphael Shen, Zefang Ramirez Lopez, L. Behrens, T. Shi, Z. Wetterlind, J. Sudduth, K.A. Stenberg, B. Guerrero, C. Gholizadeh, A. Ben-Dor, E. St Luce, M. Orellano, C. Soil spectroscopy with machine learning (ML) can estimate soil properties. Extensive soil spectral libraries (SSLs) have been developed for this purpose. However, general models built with those SSLs do not generalize well on new ‘unseen’ local data. The main reason is the different characteristics of the observations in the SSL and the local data, which cause their conditional and marginal distributions to differ. This makes the modelling of soil properties with spectra challenging. General models developed using large ‘global’ SSLs offer broad, systematic information on the soil-spectra relationships. However, to accurately generalize in a local situation, they must be adjusted to capture the site-specific characteristics of the local observations. Most current methods for ‘localizing’ spectroscopic modelling report inconsistent results. An understanding of spectroscopic ‘localization’ is lacking, and there is no framework to guide further developments. Here, we review current localization methods and propose their reformulation as a transfer learning (TL) undertaking. We then demonstrate the implementation of instance-based TL with RS-LOCAL 2.0 for modelling the soil organic carbon (SOC) content of 12 sites representing fields, farms and regions from 10 countries on the seven continents. The method uses a small number of instances or observations (measured soil property values and corresponding spectra) from the local site to transfer relevant information from a large and diverse global SSL (GSSL 2.0) with more than 50,000 records. We found that with ≤30 local observations, RS-LOCAL 2.0 produces more accurate and stable estimates of SOC than modelling with only the local data. Using the information in the GSSL 2.0 and reducing the number of samples for laboratory analysis, the method improves the cost-efficiency and practicality of soil spectroscopy. We interpreted the transfer by analysing the data, models, and soil and environmental relationships of the local and the ‘transferred’ data to gain insight into the approach. Transferring instances from the GSSL 2.0 to the local sites helped to align their conditional and marginal distributions, making the spectra-SOC relationships in the models more robust. Finally, we propose directions for future research. The guiding principle for developing practical and cost-effective spectroscopy should be to think globally but fit locally. By reformulating the localization problem within a TL framework, we hope to have acquainted the soil science community with a set of methodologies that can inspire the development of new, innovative algorithms for soil spectroscopic modelling. 2024 Journal Article http://hdl.handle.net/20.500.11937/96051 10.1016/j.earscirev.2024.104797 http://purl.org/au-research/grants/arc/DP210100420 https://creativecommons.org/licenses/by/4.0/ fulltext
spellingShingle Viscarra Rossel, Raphael
Shen, Zefang
Ramirez Lopez, L.
Behrens, T.
Shi, Z.
Wetterlind, J.
Sudduth, K.A.
Stenberg, B.
Guerrero, C.
Gholizadeh, A.
Ben-Dor, E.
St Luce, M.
Orellano, C.
An imperative for soil spectroscopic modelling is to think global but fit local with transfer learning
title An imperative for soil spectroscopic modelling is to think global but fit local with transfer learning
title_full An imperative for soil spectroscopic modelling is to think global but fit local with transfer learning
title_fullStr An imperative for soil spectroscopic modelling is to think global but fit local with transfer learning
title_full_unstemmed An imperative for soil spectroscopic modelling is to think global but fit local with transfer learning
title_short An imperative for soil spectroscopic modelling is to think global but fit local with transfer learning
title_sort imperative for soil spectroscopic modelling is to think global but fit local with transfer learning
url http://purl.org/au-research/grants/arc/DP210100420
http://hdl.handle.net/20.500.11937/96051