An extended tree-based keyphrase extraction technique (etket) for academic articles based on syntactic features

Recently, automatic keyphrase extraction (AKE) has faced challenges in extracting high quality keyphrases and summarizing information at a superior level due to technologicaL advancements and the exponential growth of digital sources and textual information.Machine learning including unsupervised AK...

Full description

Bibliographic Details
Main Author: Mohammad Badrul Alam, Miah
Format: Thesis
Language:English
Published: 2024
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/44274/
http://umpir.ump.edu.my/id/eprint/44274/1/An%20extended%20tree-based%20keyphrase%20extraction%20technique%20%28etket%29%20for%20academic%20articles%20based%20on%20syntactic%20features.pdf
_version_ 1848827066897661952
author Mohammad Badrul Alam, Miah
author_facet Mohammad Badrul Alam, Miah
author_sort Mohammad Badrul Alam, Miah
building UMP Institutional Repository
collection Online Access
description Recently, automatic keyphrase extraction (AKE) has faced challenges in extracting high quality keyphrases and summarizing information at a superior level due to technologicaL advancements and the exponential growth of digital sources and textual information.Machine learning including unsupervised AKE depends on feature extraction to extract relevant features and ranking procedure to choose significant keyphrases. However, existing unsupervised AKE has some limitations, including the inability to recognize appropriate features that provide diversity and topical coverage, which are occasionally neglected and misguided ranking procedures. In addition, the existing tree-based technique doesn’t use feature extraction, which is vital for achieving good performance, and uses term frequency (TF) only as a key feature, which misguides the ranking procedure to select the most significant keyphrase because the TF values of irrelevant keyphrases are higher than those of relevant keyphrases. Therefore, this thesis sought to develop an extended tree-based keyphrase extraction technique (ETKET) by proposing new features of keyphrases with new formulas and an extended ranking procedure to select the top most significant keyphrases from academic articles. The proposed technique consists of five main processes: data collection and preprocessing; candidate keyphrase selection; candidate keyphrase processing to select the final candidate keyphrase using a keyphrase extraction (KePhEx) tree; feature extraction to extract new features such as keyphrase frequency,keyphrase centroid, keyphrase distance, keyphrase concentration area, keyphrase position,and keyphrase positions in different sentences; and finally, an extended ranking procedure to select the topmost significant keyphrases. The proposed technique was evaluated on five widely used benchmark long datasets (SemEval2010, Schutz2008, Nguyen2007, Citeulike180, and Cacic) to measure its performance and effectiveness. The obtained results were then compared with state-of-the-art techniques, showing that the proposed technique outperformed others in terms of precision, recall, and F1-score. Thus, the results proved that the ETKET was able to extract the topmost significant keyphrases
first_indexed 2025-11-15T03:54:49Z
format Thesis
id ump-44274
institution Universiti Malaysia Pahang
institution_category Local University
language English
last_indexed 2025-11-15T03:54:49Z
publishDate 2024
recordtype eprints
repository_type Digital Repository
spelling ump-442742025-05-07T07:08:15Z http://umpir.ump.edu.my/id/eprint/44274/ An extended tree-based keyphrase extraction technique (etket) for academic articles based on syntactic features Mohammad Badrul Alam, Miah QA76 Computer software T Technology (General) Recently, automatic keyphrase extraction (AKE) has faced challenges in extracting high quality keyphrases and summarizing information at a superior level due to technologicaL advancements and the exponential growth of digital sources and textual information.Machine learning including unsupervised AKE depends on feature extraction to extract relevant features and ranking procedure to choose significant keyphrases. However, existing unsupervised AKE has some limitations, including the inability to recognize appropriate features that provide diversity and topical coverage, which are occasionally neglected and misguided ranking procedures. In addition, the existing tree-based technique doesn’t use feature extraction, which is vital for achieving good performance, and uses term frequency (TF) only as a key feature, which misguides the ranking procedure to select the most significant keyphrase because the TF values of irrelevant keyphrases are higher than those of relevant keyphrases. Therefore, this thesis sought to develop an extended tree-based keyphrase extraction technique (ETKET) by proposing new features of keyphrases with new formulas and an extended ranking procedure to select the top most significant keyphrases from academic articles. The proposed technique consists of five main processes: data collection and preprocessing; candidate keyphrase selection; candidate keyphrase processing to select the final candidate keyphrase using a keyphrase extraction (KePhEx) tree; feature extraction to extract new features such as keyphrase frequency,keyphrase centroid, keyphrase distance, keyphrase concentration area, keyphrase position,and keyphrase positions in different sentences; and finally, an extended ranking procedure to select the topmost significant keyphrases. The proposed technique was evaluated on five widely used benchmark long datasets (SemEval2010, Schutz2008, Nguyen2007, Citeulike180, and Cacic) to measure its performance and effectiveness. The obtained results were then compared with state-of-the-art techniques, showing that the proposed technique outperformed others in terms of precision, recall, and F1-score. Thus, the results proved that the ETKET was able to extract the topmost significant keyphrases 2024-05 Thesis NonPeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/44274/1/An%20extended%20tree-based%20keyphrase%20extraction%20technique%20%28etket%29%20for%20academic%20articles%20based%20on%20syntactic%20features.pdf Mohammad Badrul Alam, Miah (2024) An extended tree-based keyphrase extraction technique (etket) for academic articles based on syntactic features. PhD thesis, Universti Malaysia Pahang Al-Sultan Abdullah (Contributors, Thesis advisor: Suryanti, Awang).
spellingShingle QA76 Computer software
T Technology (General)
Mohammad Badrul Alam, Miah
An extended tree-based keyphrase extraction technique (etket) for academic articles based on syntactic features
title An extended tree-based keyphrase extraction technique (etket) for academic articles based on syntactic features
title_full An extended tree-based keyphrase extraction technique (etket) for academic articles based on syntactic features
title_fullStr An extended tree-based keyphrase extraction technique (etket) for academic articles based on syntactic features
title_full_unstemmed An extended tree-based keyphrase extraction technique (etket) for academic articles based on syntactic features
title_short An extended tree-based keyphrase extraction technique (etket) for academic articles based on syntactic features
title_sort extended tree-based keyphrase extraction technique (etket) for academic articles based on syntactic features
topic QA76 Computer software
T Technology (General)
url http://umpir.ump.edu.my/id/eprint/44274/
http://umpir.ump.edu.my/id/eprint/44274/1/An%20extended%20tree-based%20keyphrase%20extraction%20technique%20%28etket%29%20for%20academic%20articles%20based%20on%20syntactic%20features.pdf