Statistical tests for large tree-structured data

We develop a general statistical framework for the analysis and inference of large tree-structured data, with a focus on developing asymptotic goodness-of-fit tests. We first propose a consistent statistical model for binary trees, from which we develop a class of invariant tests. Using the model fo...

Full description

Bibliographic Details
Main Authors: Bharath, Karthik, Kambadur, Prabhanjan, Dey, Dipak. K., Rao, Arvind, Baladandayuthapani, Veerabhadran
Format: Article
Published: Taylor & Francis 2017
Online Access:https://eprints.nottingham.ac.uk/40800/
_version_ 1848796136562753536
author Bharath, Karthik
Kambadur, Prabhanjan
Dey, Dipak. K.
Rao, Arvind
Baladandayuthapani, Veerabhadran
author_facet Bharath, Karthik
Kambadur, Prabhanjan
Dey, Dipak. K.
Rao, Arvind
Baladandayuthapani, Veerabhadran
author_sort Bharath, Karthik
building Nottingham Research Data Repository
collection Online Access
description We develop a general statistical framework for the analysis and inference of large tree-structured data, with a focus on developing asymptotic goodness-of-fit tests. We first propose a consistent statistical model for binary trees, from which we develop a class of invariant tests. Using the model for binary trees, we then construct tests for general trees by using the distributional properties of the Continuum Random Tree, which arises as the invariant limit for a broad class of models for tree-structured data based on conditioned Galton–Watson processes. The test statistics for the goodness-of-fit tests are simple to compute and are asymptotically distributed as χ2 and F random variables. We illustrate our methods on an important application of detecting tumour heterogeneity in brain cancer. We use a novel approach with tree-based representations of magnetic resonance images and employ the developed tests to ascertain tumor heterogeneity between two groups of patients.
first_indexed 2025-11-14T19:43:11Z
format Article
id nottingham-40800
institution University of Nottingham Malaysia Campus
institution_category Local University
last_indexed 2025-11-14T19:43:11Z
publishDate 2017
publisher Taylor & Francis
recordtype eprints
repository_type Digital Repository
spelling nottingham-408002020-05-04T18:59:39Z https://eprints.nottingham.ac.uk/40800/ Statistical tests for large tree-structured data Bharath, Karthik Kambadur, Prabhanjan Dey, Dipak. K. Rao, Arvind Baladandayuthapani, Veerabhadran We develop a general statistical framework for the analysis and inference of large tree-structured data, with a focus on developing asymptotic goodness-of-fit tests. We first propose a consistent statistical model for binary trees, from which we develop a class of invariant tests. Using the model for binary trees, we then construct tests for general trees by using the distributional properties of the Continuum Random Tree, which arises as the invariant limit for a broad class of models for tree-structured data based on conditioned Galton–Watson processes. The test statistics for the goodness-of-fit tests are simple to compute and are asymptotically distributed as χ2 and F random variables. We illustrate our methods on an important application of detecting tumour heterogeneity in brain cancer. We use a novel approach with tree-based representations of magnetic resonance images and employ the developed tests to ascertain tumor heterogeneity between two groups of patients. Taylor & Francis 2017-08-07 Article PeerReviewed Bharath, Karthik, Kambadur, Prabhanjan, Dey, Dipak. K., Rao, Arvind and Baladandayuthapani, Veerabhadran (2017) Statistical tests for large tree-structured data. Journal of the American Statistical Association, 112 (520). pp. 1733-1743. ISSN 1537-274X http://www.tandfonline.com/doi/full/10.1080/01621459.2016.1240081 doi:10.1080/01621459.2016.1240081 doi:10.1080/01621459.2016.1240081
spellingShingle Bharath, Karthik
Kambadur, Prabhanjan
Dey, Dipak. K.
Rao, Arvind
Baladandayuthapani, Veerabhadran
Statistical tests for large tree-structured data
title Statistical tests for large tree-structured data
title_full Statistical tests for large tree-structured data
title_fullStr Statistical tests for large tree-structured data
title_full_unstemmed Statistical tests for large tree-structured data
title_short Statistical tests for large tree-structured data
title_sort statistical tests for large tree-structured data
url https://eprints.nottingham.ac.uk/40800/
https://eprints.nottingham.ac.uk/40800/
https://eprints.nottingham.ac.uk/40800/