Effectiveness of tree-based pipeline optimization tools and grid search method in breast cancer prediction

Breast cancer has been known as the most prevalent and common cause of death among Malaysian woman especially over the age of 40. Breast cancer can usually be identified as either benign or malignant with invasive biopsy procedure. The treatment protocol is allocated based on the whether the mas...

Full description

Bibliographic Details
Main Author: Mat Radzi, Siti Fairuz
Format: Thesis
Language:English
Published: 2021
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/104309/
http://psasir.upm.edu.my/id/eprint/104309/1/SITI%20FAIRUZ%20BINTI%20MAT%20RADZI%20-%20IR.pdf
_version_ 1848864253798252544
author Mat Radzi, Siti Fairuz
author_facet Mat Radzi, Siti Fairuz
author_sort Mat Radzi, Siti Fairuz
building UPM Institutional Repository
collection Online Access
description Breast cancer has been known as the most prevalent and common cause of death among Malaysian woman especially over the age of 40. Breast cancer can usually be identified as either benign or malignant with invasive biopsy procedure. The treatment protocol is allocated based on the whether the mass is benign or malignant. Fortunately, breast cancer like many other cancer types are curable and patient survival can be improved, subject to early diagnosis. Radiograph images lies numbers of features that useful for computer aided diagnosis. In this thesis, the work is divided into two main phases; 1) evaluating the reproducibility of radiomics features derived from manual delineation and semiautomatic segmentation after two different contrast enhancement techniques on masses in two-dimensional (2D) mammography images and 2) to implement the Automated Machine Learning (AutoML) in classifying types of mass in mammogram images. With introduction of ML techniques, breast cancer can be diagnosed in early stage without any invasive and risky procedure. The methodology presented in this research consist of several stages including, image acquisition, image segmentation, feature extraction/selection and, classification using AutoML. The first phase determines the reproducibility between Contrast Limited Adaptive Histogram Equalization (CLAHE) and Adaptive Histogram Equalization (AHE) techniques. The semiautomatic segmentation techniques used in the first phase is Active Contour Method (ACM) with 100 iterations. Three types of radiomics features were extracted including first order, second order and shape features. 37 features were extracted from each tumor in three different techniques mentioned: 9 of these were shape-based features, while 28 were texture-based features. Notably the CLAHE group (ICC = 0.890 ± 0.554, p < 0.05) had the highest reproducibility compared to the features extracted from the AHE group (ICC = 0.850 ± 0.933, p < 0.05) and manual delineation (ICC = 0.673 ± 0.807, p > 0.05). Therefore, the segmentation techniques used in the second phase are based on CLAHE and ACM method. The Principal Component Analysis (PCA) Random Forest (RF) classification has proved to be the most reliable pipelines with the lowest complexity in this research with 92% of accuracy, 83% of precision, 100% of sensitivity, 94% of ROC.
first_indexed 2025-11-15T13:45:53Z
format Thesis
id upm-104309
institution Universiti Putra Malaysia
institution_category Local University
language English
last_indexed 2025-11-15T13:45:53Z
publishDate 2021
recordtype eprints
repository_type Digital Repository
spelling upm-1043092023-07-26T02:12:20Z http://psasir.upm.edu.my/id/eprint/104309/ Effectiveness of tree-based pipeline optimization tools and grid search method in breast cancer prediction Mat Radzi, Siti Fairuz Breast cancer has been known as the most prevalent and common cause of death among Malaysian woman especially over the age of 40. Breast cancer can usually be identified as either benign or malignant with invasive biopsy procedure. The treatment protocol is allocated based on the whether the mass is benign or malignant. Fortunately, breast cancer like many other cancer types are curable and patient survival can be improved, subject to early diagnosis. Radiograph images lies numbers of features that useful for computer aided diagnosis. In this thesis, the work is divided into two main phases; 1) evaluating the reproducibility of radiomics features derived from manual delineation and semiautomatic segmentation after two different contrast enhancement techniques on masses in two-dimensional (2D) mammography images and 2) to implement the Automated Machine Learning (AutoML) in classifying types of mass in mammogram images. With introduction of ML techniques, breast cancer can be diagnosed in early stage without any invasive and risky procedure. The methodology presented in this research consist of several stages including, image acquisition, image segmentation, feature extraction/selection and, classification using AutoML. The first phase determines the reproducibility between Contrast Limited Adaptive Histogram Equalization (CLAHE) and Adaptive Histogram Equalization (AHE) techniques. The semiautomatic segmentation techniques used in the first phase is Active Contour Method (ACM) with 100 iterations. Three types of radiomics features were extracted including first order, second order and shape features. 37 features were extracted from each tumor in three different techniques mentioned: 9 of these were shape-based features, while 28 were texture-based features. Notably the CLAHE group (ICC = 0.890 ± 0.554, p < 0.05) had the highest reproducibility compared to the features extracted from the AHE group (ICC = 0.850 ± 0.933, p < 0.05) and manual delineation (ICC = 0.673 ± 0.807, p > 0.05). Therefore, the segmentation techniques used in the second phase are based on CLAHE and ACM method. The Principal Component Analysis (PCA) Random Forest (RF) classification has proved to be the most reliable pipelines with the lowest complexity in this research with 92% of accuracy, 83% of precision, 100% of sensitivity, 94% of ROC. 2021-10 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/104309/1/SITI%20FAIRUZ%20BINTI%20MAT%20RADZI%20-%20IR.pdf Mat Radzi, Siti Fairuz (2021) Effectiveness of tree-based pipeline optimization tools and grid search method in breast cancer prediction. Masters thesis, Universiti Putra Malaysia. Radiography, Medical BRCA genes
spellingShingle Radiography, Medical
BRCA genes
Mat Radzi, Siti Fairuz
Effectiveness of tree-based pipeline optimization tools and grid search method in breast cancer prediction
title Effectiveness of tree-based pipeline optimization tools and grid search method in breast cancer prediction
title_full Effectiveness of tree-based pipeline optimization tools and grid search method in breast cancer prediction
title_fullStr Effectiveness of tree-based pipeline optimization tools and grid search method in breast cancer prediction
title_full_unstemmed Effectiveness of tree-based pipeline optimization tools and grid search method in breast cancer prediction
title_short Effectiveness of tree-based pipeline optimization tools and grid search method in breast cancer prediction
title_sort effectiveness of tree-based pipeline optimization tools and grid search method in breast cancer prediction
topic Radiography, Medical
BRCA genes
url http://psasir.upm.edu.my/id/eprint/104309/
http://psasir.upm.edu.my/id/eprint/104309/1/SITI%20FAIRUZ%20BINTI%20MAT%20RADZI%20-%20IR.pdf