TeKET: a tree-based unsupervised keyphrase extraction technique

Automatic keyphrase extraction techniques aim to extract quality keyphrases for higher level summarization of a document. Majority of the existing techniques are mainly domain-specific, which require application domain knowledge and employ higher order statistical methods, and computationally expe...

Full description

Bibliographic Details
Main Authors: Rabby, Gollam, Md Saiful, Azad, Mufti, Mahmud, Kamal Z., Zamli, Mohammed Mostafizur, Rahman
Format: Article
Language:English
Published: Springer 2020
Subjects:
Online Access:https://umpir.ump.edu.my/id/eprint/29348/
_version_ 1848827277164412928
author Rabby, Gollam
Md Saiful, Azad
Mufti, Mahmud
Kamal Z., Zamli
Mohammed Mostafizur, Rahman
author_facet Rabby, Gollam
Md Saiful, Azad
Mufti, Mahmud
Kamal Z., Zamli
Mohammed Mostafizur, Rahman
author_sort Rabby, Gollam
building UMP Institutional Repository
collection Online Access
description Automatic keyphrase extraction techniques aim to extract quality keyphrases for higher level summarization of a document. Majority of the existing techniques are mainly domain-specific, which require application domain knowledge and employ higher order statistical methods, and computationally expensive and require large train data, which is rare for many applications. Overcoming these issues, this paper proposes a new unsupervised keyphrase extraction technique. The proposed unsupervised keyphrase extraction technique, named TeKET or Tree-based Keyphrase Extraction Technique, is a domain-independent technique that employs limited statistical knowledge and requires no train data. This technique also introduces a new variant of a binary tree, called KeyPhrase Extraction (KePhEx) tree, to extract final keyphrases from candidate keyphrases. In addition, a measure, called Cohesiveness Index or CI, is derived which denotes a given node’s degree of cohesiveness with respect to the root. The CI is used in flexibly extracting final keyphrases from the KePhEx tree and is co-utilized in the ranking process. The effectiveness of the proposed technique and its domain and language independence are experimentally evaluated using available benchmark corpora, namely SemEval-2010 (a scientific articles dataset), Theses100 (a thesis dataset), and a German Research Article dataset, respectively. The acquired results are compared with other relevant unsupervised techniques belonging to both statistical and graph-based techniques. The obtained results demonstrate the improved performance of the proposed technique over other compared techniques in terms of precision, recall, and F1 scores.
first_indexed 2025-11-15T03:58:09Z
format Article
id ump-29348
institution Universiti Malaysia Pahang
institution_category Local University
language English
last_indexed 2025-11-15T03:58:09Z
publishDate 2020
publisher Springer
recordtype eprints
repository_type Digital Repository
spelling ump-293482025-09-26T08:37:11Z https://umpir.ump.edu.my/id/eprint/29348/ TeKET: a tree-based unsupervised keyphrase extraction technique Rabby, Gollam Md Saiful, Azad Mufti, Mahmud Kamal Z., Zamli Mohammed Mostafizur, Rahman QA75 Electronic computers. Computer science Automatic keyphrase extraction techniques aim to extract quality keyphrases for higher level summarization of a document. Majority of the existing techniques are mainly domain-specific, which require application domain knowledge and employ higher order statistical methods, and computationally expensive and require large train data, which is rare for many applications. Overcoming these issues, this paper proposes a new unsupervised keyphrase extraction technique. The proposed unsupervised keyphrase extraction technique, named TeKET or Tree-based Keyphrase Extraction Technique, is a domain-independent technique that employs limited statistical knowledge and requires no train data. This technique also introduces a new variant of a binary tree, called KeyPhrase Extraction (KePhEx) tree, to extract final keyphrases from candidate keyphrases. In addition, a measure, called Cohesiveness Index or CI, is derived which denotes a given node’s degree of cohesiveness with respect to the root. The CI is used in flexibly extracting final keyphrases from the KePhEx tree and is co-utilized in the ranking process. The effectiveness of the proposed technique and its domain and language independence are experimentally evaluated using available benchmark corpora, namely SemEval-2010 (a scientific articles dataset), Theses100 (a thesis dataset), and a German Research Article dataset, respectively. The acquired results are compared with other relevant unsupervised techniques belonging to both statistical and graph-based techniques. The obtained results demonstrate the improved performance of the proposed technique over other compared techniques in terms of precision, recall, and F1 scores. Springer 2020 Article PeerReviewed pdf en cc_by_4 https://umpir.ump.edu.my/id/eprint/29348/1/20.%20TeKET%20-%20a%20tree%20based%20unsupervised%20keyphrase%20extraction%20technique.pdf Rabby, Gollam and Md Saiful, Azad and Mufti, Mahmud and Kamal Z., Zamli and Mohammed Mostafizur, Rahman (2020) TeKET: a tree-based unsupervised keyphrase extraction technique. Cognitive Computation, 12 (4). pp. 811-833. ISSN 1866 - 9956. (Published) https://doi.org/10.1007/s12559-019-09706-3 https://doi.org/10.1007/s12559-019-09706-3 https://doi.org/10.1007/s12559-019-09706-3
spellingShingle QA75 Electronic computers. Computer science
Rabby, Gollam
Md Saiful, Azad
Mufti, Mahmud
Kamal Z., Zamli
Mohammed Mostafizur, Rahman
TeKET: a tree-based unsupervised keyphrase extraction technique
title TeKET: a tree-based unsupervised keyphrase extraction technique
title_full TeKET: a tree-based unsupervised keyphrase extraction technique
title_fullStr TeKET: a tree-based unsupervised keyphrase extraction technique
title_full_unstemmed TeKET: a tree-based unsupervised keyphrase extraction technique
title_short TeKET: a tree-based unsupervised keyphrase extraction technique
title_sort teket: a tree-based unsupervised keyphrase extraction technique
topic QA75 Electronic computers. Computer science
url https://umpir.ump.edu.my/id/eprint/29348/
https://umpir.ump.edu.my/id/eprint/29348/
https://umpir.ump.edu.my/id/eprint/29348/