KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction

Automatic keyphrase extraction remains a significant and difficult issue in the current research domain because of the exponential explosion of information and internet sources. Various activities involving natural language processing and information retrieval systems greatly benefit from the use of...

Full description

Bibliographic Details
Main Authors: Alam Miah, Mohammad Badrul, Suryanti, Awang
Format: Conference or Workshop Item
Language:English
Published: 2022
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/36844/
http://umpir.ump.edu.my/id/eprint/36844/1/KDA%20_%20An%20unsupervised%20approach%20for%20analyzing%20keyphrases%20distance%20from%20news%20articles%20as%20a%20feature%20of%20keyphrase%20extraction.pdf
_version_ 1848825100109873152
author Alam Miah, Mohammad Badrul
Suryanti, Awang
author_facet Alam Miah, Mohammad Badrul
Suryanti, Awang
author_sort Alam Miah, Mohammad Badrul
building UMP Institutional Repository
collection Online Access
description Automatic keyphrase extraction remains a significant and difficult issue in the current research domain because of the exponential explosion of information and internet sources. Various activities involving natural language processing and information retrieval systems greatly benefit from the use of keyphrases. To extract the best keyphrases and summarize the documents to the highest standard, feature extractions for those keyphrases are crucial. This paper proposes an unsupervised region-based KDA technique for analyzing the distance of keyphrases from news articles as feature of keyphrase extraction. The proposed technique is divided into eight phases: data collection, data pre-processing, data processing, keyphrase searching, distance calculating, distance averaging, curve-plotting, and curve-fitting. At first, the proposed technique collects two different datasets that contain the news articles; it is then applied to the data pre-processing step that uses a few preprocessing algorithms. Then this pre-processing data is used in the data processing stage, where it is sent to the keyphrase searching step, the distance calculation process, and then the distance averaging steps. Curve plotting analysis is then applied, and finally the curve fitting technique is used. Afterwards, the performance of the proposed technique is put to test and evaluated using two of the most accessible benchmark datasets. The proposed method is then compared to other available methods in order to demonstrate its efficiency, advantages, and importance. Lastly, the results of the experiment demonstrated that the proposed approach efficiently analyzed the keyphrase distance from news articles, produced an F1-score of 96.91%, and presented keyphrases of 94.55%, as well as greatly improved the effectiveness of the current keyphrase extraction methods.
first_indexed 2025-11-15T03:23:33Z
format Conference or Workshop Item
id ump-36844
institution Universiti Malaysia Pahang
institution_category Local University
language English
last_indexed 2025-11-15T03:23:33Z
publishDate 2022
recordtype eprints
repository_type Digital Repository
spelling ump-368442024-01-04T01:25:04Z http://umpir.ump.edu.my/id/eprint/36844/ KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction Alam Miah, Mohammad Badrul Suryanti, Awang QA75 Electronic computers. Computer science QA76 Computer software T Technology (General) TA Engineering (General). Civil engineering (General) Automatic keyphrase extraction remains a significant and difficult issue in the current research domain because of the exponential explosion of information and internet sources. Various activities involving natural language processing and information retrieval systems greatly benefit from the use of keyphrases. To extract the best keyphrases and summarize the documents to the highest standard, feature extractions for those keyphrases are crucial. This paper proposes an unsupervised region-based KDA technique for analyzing the distance of keyphrases from news articles as feature of keyphrase extraction. The proposed technique is divided into eight phases: data collection, data pre-processing, data processing, keyphrase searching, distance calculating, distance averaging, curve-plotting, and curve-fitting. At first, the proposed technique collects two different datasets that contain the news articles; it is then applied to the data pre-processing step that uses a few preprocessing algorithms. Then this pre-processing data is used in the data processing stage, where it is sent to the keyphrase searching step, the distance calculation process, and then the distance averaging steps. Curve plotting analysis is then applied, and finally the curve fitting technique is used. Afterwards, the performance of the proposed technique is put to test and evaluated using two of the most accessible benchmark datasets. The proposed method is then compared to other available methods in order to demonstrate its efficiency, advantages, and importance. Lastly, the results of the experiment demonstrated that the proposed approach efficiently analyzed the keyphrase distance from news articles, produced an F1-score of 96.91%, and presented keyphrases of 94.55%, as well as greatly improved the effectiveness of the current keyphrase extraction methods. 2022-11-15 Conference or Workshop Item PeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/36844/1/KDA%20_%20An%20unsupervised%20approach%20for%20analyzing%20keyphrases%20distance%20from%20news%20articles%20as%20a%20feature%20of%20keyphrase%20extraction.pdf Alam Miah, Mohammad Badrul and Suryanti, Awang (2022) KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction. In: The 6th National Conference for Postgraduate Research (NCON-PGR 2022) , 15 November 2022 , Virtual Conference, Universiti Malaysia Pahang, Malaysia. p. 83.. (Published) https://ncon-pgr.ump.edu.my/index.php/en/?option=com_fileman&view=file&routed=1&name=E-BOOK%20NCON%202022%20.pdf&folder=E-BOOK%20NCON%202022&container=fileman-files
spellingShingle QA75 Electronic computers. Computer science
QA76 Computer software
T Technology (General)
TA Engineering (General). Civil engineering (General)
Alam Miah, Mohammad Badrul
Suryanti, Awang
KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction
title KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction
title_full KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction
title_fullStr KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction
title_full_unstemmed KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction
title_short KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction
title_sort kda: an unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction
topic QA75 Electronic computers. Computer science
QA76 Computer software
T Technology (General)
TA Engineering (General). Civil engineering (General)
url http://umpir.ump.edu.my/id/eprint/36844/
http://umpir.ump.edu.my/id/eprint/36844/
http://umpir.ump.edu.my/id/eprint/36844/1/KDA%20_%20An%20unsupervised%20approach%20for%20analyzing%20keyphrases%20distance%20from%20news%20articles%20as%20a%20feature%20of%20keyphrase%20extraction.pdf