IDENTIFICATION AND ANALYSIS OF CONFOUNDING VARIABLES AND SIMPSON’S PARADOX

This dissertation harnesses machine learning algorithms and model agnostic tools to explore the counter intuitive relationship between protein intake from legumes and pass rate in Malawi. This dissertation focuses on an exploratory analysis to study approaches towards creating sub-groups based on K-...

Full description

Bibliographic Details
Main Author: Chattopadhyay, Ishani
Format: Dissertation (University of Nottingham only)
Language:English
Published: 2022
Online Access:https://eprints.nottingham.ac.uk/70466/
_version_ 1848800618970349568
author Chattopadhyay, Ishani
author_facet Chattopadhyay, Ishani
author_sort Chattopadhyay, Ishani
building Nottingham Research Data Repository
collection Online Access
description This dissertation harnesses machine learning algorithms and model agnostic tools to explore the counter intuitive relationship between protein intake from legumes and pass rate in Malawi. This dissertation focuses on an exploratory analysis to study approaches towards creating sub-groups based on K-means clustering algorithm in order to identify Simpson’s Paradox. The curious case of negative relationship between protein intake from legumes and pass rate in Malawi, has been addressed through identification of confounders by harnessing logistic regression and chi-square tests. Random Forest Model and Partial Dependency Plots have been utilised to study the relationship between protein intake from legumes and pass rates by creating sub-groups of the confounders in order to isolate the effect of these confounders. This dissertation follows a waterfall method that dives deeper into identification of confounders whenever a sub group indicates a negative relationship between legumes and pass rates. The dissertation tries to answer certain trends in the relationship and possible ways to understand the problem. The analysis helps identify areas that could be explored further in order to provide better amenities to improve the standard of living of the poorer areas in Malawi. Keywords: Simpsons Paradox, Partial Dependency Plots, Individual Conditional Expectation Plot, K-means clustering, counter-intuitive behaviour, confounder identification, confounder analysis.
first_indexed 2025-11-14T20:54:26Z
format Dissertation (University of Nottingham only)
id nottingham-70466
institution University of Nottingham Malaysia Campus
institution_category Local University
language English
last_indexed 2025-11-14T20:54:26Z
publishDate 2022
recordtype eprints
repository_type Digital Repository
spelling nottingham-704662023-07-06T11:48:01Z https://eprints.nottingham.ac.uk/70466/ IDENTIFICATION AND ANALYSIS OF CONFOUNDING VARIABLES AND SIMPSON’S PARADOX Chattopadhyay, Ishani This dissertation harnesses machine learning algorithms and model agnostic tools to explore the counter intuitive relationship between protein intake from legumes and pass rate in Malawi. This dissertation focuses on an exploratory analysis to study approaches towards creating sub-groups based on K-means clustering algorithm in order to identify Simpson’s Paradox. The curious case of negative relationship between protein intake from legumes and pass rate in Malawi, has been addressed through identification of confounders by harnessing logistic regression and chi-square tests. Random Forest Model and Partial Dependency Plots have been utilised to study the relationship between protein intake from legumes and pass rates by creating sub-groups of the confounders in order to isolate the effect of these confounders. This dissertation follows a waterfall method that dives deeper into identification of confounders whenever a sub group indicates a negative relationship between legumes and pass rates. The dissertation tries to answer certain trends in the relationship and possible ways to understand the problem. The analysis helps identify areas that could be explored further in order to provide better amenities to improve the standard of living of the poorer areas in Malawi. Keywords: Simpsons Paradox, Partial Dependency Plots, Individual Conditional Expectation Plot, K-means clustering, counter-intuitive behaviour, confounder identification, confounder analysis. 2022-12-01 Dissertation (University of Nottingham only) NonPeerReviewed application/pdf en https://eprints.nottingham.ac.uk/70466/1/Identification%20and%20analysis%20of%20confounding%20variables%20and%20Simpsons%20Paradox_20399497.pdf Chattopadhyay, Ishani (2022) IDENTIFICATION AND ANALYSIS OF CONFOUNDING VARIABLES AND SIMPSON’S PARADOX. [Dissertation (University of Nottingham only)]
spellingShingle Chattopadhyay, Ishani
IDENTIFICATION AND ANALYSIS OF CONFOUNDING VARIABLES AND SIMPSON’S PARADOX
title IDENTIFICATION AND ANALYSIS OF CONFOUNDING VARIABLES AND SIMPSON’S PARADOX
title_full IDENTIFICATION AND ANALYSIS OF CONFOUNDING VARIABLES AND SIMPSON’S PARADOX
title_fullStr IDENTIFICATION AND ANALYSIS OF CONFOUNDING VARIABLES AND SIMPSON’S PARADOX
title_full_unstemmed IDENTIFICATION AND ANALYSIS OF CONFOUNDING VARIABLES AND SIMPSON’S PARADOX
title_short IDENTIFICATION AND ANALYSIS OF CONFOUNDING VARIABLES AND SIMPSON’S PARADOX
title_sort identification and analysis of confounding variables and simpson’s paradox
url https://eprints.nottingham.ac.uk/70466/