Empirical comparison of tree ensemble variable importance measures

Tree ensembles are becoming well-established as popular and powerful data modelling techniques. Tree ensemble models are essentially black box models, although their individual members may not be, and with their growing popularity, interest in the interpretation of tree ensemble models has also grow...

Full description

Bibliographic Details
Main Authors: Auret, L., Aldrich, Chris
Format: Journal Article
Published: ELSEVIER 2011
Subjects:
Online Access:http://hdl.handle.net/20.500.11937/47469
_version_ 1848757841966399488
author Auret, L.
Aldrich, Chris
author_facet Auret, L.
Aldrich, Chris
author_sort Auret, L.
building Curtin Institutional Repository
collection Online Access
description Tree ensembles are becoming well-established as popular and powerful data modelling techniques. Tree ensemble models are essentially black box models, although their individual members may not be, and with their growing popularity, interest in the interpretation of tree ensemble models has also grown. This study presents variable importance measures associated with random forests, conditional inference forests and boosted trees, and employs a number of simulated data sets to compare these methods. Overall, variable importance indicators based on bagged conditional inference forests appear to strike a good balance between identification of significant variables and avoiding unnecessary flagging of correlated variables. Data preprocessing and interpretation by experts knowledgeable with a specific data set remain vital.
first_indexed 2025-11-14T09:34:31Z
format Journal Article
id curtin-20.500.11937-47469
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T09:34:31Z
publishDate 2011
publisher ELSEVIER
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-474692017-09-13T16:07:45Z Empirical comparison of tree ensemble variable importance measures Auret, L. Aldrich, Chris Decision trees - Variable importance - Ensemble learning - Random forests - Fault identification - Boosted trees - Conditional inference forests Tree ensembles are becoming well-established as popular and powerful data modelling techniques. Tree ensemble models are essentially black box models, although their individual members may not be, and with their growing popularity, interest in the interpretation of tree ensemble models has also grown. This study presents variable importance measures associated with random forests, conditional inference forests and boosted trees, and employs a number of simulated data sets to compare these methods. Overall, variable importance indicators based on bagged conditional inference forests appear to strike a good balance between identification of significant variables and avoiding unnecessary flagging of correlated variables. Data preprocessing and interpretation by experts knowledgeable with a specific data set remain vital. 2011 Journal Article http://hdl.handle.net/20.500.11937/47469 10.1016/j.chemolab.2010.12.004 ELSEVIER restricted
spellingShingle Decision trees
- Variable importance
- Ensemble learning
- Random forests
- Fault identification
- Boosted trees
- Conditional inference forests
Auret, L.
Aldrich, Chris
Empirical comparison of tree ensemble variable importance measures
title Empirical comparison of tree ensemble variable importance measures
title_full Empirical comparison of tree ensemble variable importance measures
title_fullStr Empirical comparison of tree ensemble variable importance measures
title_full_unstemmed Empirical comparison of tree ensemble variable importance measures
title_short Empirical comparison of tree ensemble variable importance measures
title_sort empirical comparison of tree ensemble variable importance measures
topic Decision trees
- Variable importance
- Ensemble learning
- Random forests
- Fault identification
- Boosted trees
- Conditional inference forests
url http://hdl.handle.net/20.500.11937/47469