Arabic nested noun compound extraction based on linguistic features and statistical measures

The extraction of Arabic nested noun compound is significant for several research areas such as sentiment analysis, text summarization, word categorization, grammar checker, and machine translation. Much research has studied the extraction of Arabic noun compound using linguistic approaches, stat...

Full description

Bibliographic Details
Main Authors:	Nazlia Omar, Qasem Al-Tashi
Format:	Article
Language:	English
Published:	Penerbit Universiti Kebangsaan Malaysia 2018
Online Access:	http://journalarticle.ukm.my/13773/ http://journalarticle.ukm.my/13773/1/25313-76332-1-PB.pdf

_version_	1848813370825768960
author	Nazlia Omar, Qasem Al-Tashi,
author_facet	Nazlia Omar, Qasem Al-Tashi,
author_sort	Nazlia Omar,
building	UKM Institutional Repository
collection	Online Access
description	The extraction of Arabic nested noun compound is significant for several research areas such as sentiment analysis, text summarization, word categorization, grammar checker, and machine translation. Much research has studied the extraction of Arabic noun compound using linguistic approaches, statistical methods, or a hybrid of both. A wide range of the existing approaches concentrate on the extraction of the bi-gram or tri-gram noun compound. Nonetheless, extracting a 4-gram or 5-gram nested noun compound is a challenging task due to the morphological, orthographic, syntactic and semantic variations. Many features have an important effect on the efficiency of extracting a noun compound such as unit-hood, contextual information, and term-hood. Hence, there is a need to improve the effectiveness of the Arabic nested noun compound extraction. Thus, this paper proposes a hybrid linguistic approach and a statistical method with a view to enhance the extraction of the Arabic nested noun compound. A number of pre-processing phases are presented, including transformation, tokenization, and normalisation. The linguistic approaches that have been used in this study consist of a part-of-speech tagging and the named entities pattern, whereas the proposed statistical methods that have been used in this study consist of the NC-value, NTC-value, NLC-value, and the combination of these association measures. The proposed methods have demonstrated that the combined association measures have outperformed the NLC-value, NTC-value, and NC-value in terms of nested noun compound extraction by achieving 90%, 88%, 87%, and 81% for bigram, trigram, 4-gram, and 5-gram, respectively.
first_indexed	2025-11-15T00:17:07Z
format	Article
id	oai:generic.eprints.org:13773
institution	Universiti Kebangasaan Malaysia
institution_category	Local University
language	English
last_indexed	2025-11-15T00:17:07Z
publishDate	2018
publisher	Penerbit Universiti Kebangsaan Malaysia
recordtype	eprints
repository_type	Digital Repository
spelling	oai:generic.eprints.org:137732019-12-09T23:10:45Z http://journalarticle.ukm.my/13773/ Arabic nested noun compound extraction based on linguistic features and statistical measures Nazlia Omar, Qasem Al-Tashi, The extraction of Arabic nested noun compound is significant for several research areas such as sentiment analysis, text summarization, word categorization, grammar checker, and machine translation. Much research has studied the extraction of Arabic noun compound using linguistic approaches, statistical methods, or a hybrid of both. A wide range of the existing approaches concentrate on the extraction of the bi-gram or tri-gram noun compound. Nonetheless, extracting a 4-gram or 5-gram nested noun compound is a challenging task due to the morphological, orthographic, syntactic and semantic variations. Many features have an important effect on the efficiency of extracting a noun compound such as unit-hood, contextual information, and term-hood. Hence, there is a need to improve the effectiveness of the Arabic nested noun compound extraction. Thus, this paper proposes a hybrid linguistic approach and a statistical method with a view to enhance the extraction of the Arabic nested noun compound. A number of pre-processing phases are presented, including transformation, tokenization, and normalisation. The linguistic approaches that have been used in this study consist of a part-of-speech tagging and the named entities pattern, whereas the proposed statistical methods that have been used in this study consist of the NC-value, NTC-value, NLC-value, and the combination of these association measures. The proposed methods have demonstrated that the combined association measures have outperformed the NLC-value, NTC-value, and NC-value in terms of nested noun compound extraction by achieving 90%, 88%, 87%, and 81% for bigram, trigram, 4-gram, and 5-gram, respectively. Penerbit Universiti Kebangsaan Malaysia 2018-05 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/13773/1/25313-76332-1-PB.pdf Nazlia Omar, and Qasem Al-Tashi, (2018) Arabic nested noun compound extraction based on linguistic features and statistical measures. GEMA: Online Journal of Language Studies, 18 (2). pp. 93-107. ISSN 1675-8021 http://ejournal.ukm.my/gema/issue/view/1087
spellingShingle	Nazlia Omar, Qasem Al-Tashi, Arabic nested noun compound extraction based on linguistic features and statistical measures
title	Arabic nested noun compound extraction based on linguistic features and statistical measures
title_full	Arabic nested noun compound extraction based on linguistic features and statistical measures
title_fullStr	Arabic nested noun compound extraction based on linguistic features and statistical measures
title_full_unstemmed	Arabic nested noun compound extraction based on linguistic features and statistical measures
title_short	Arabic nested noun compound extraction based on linguistic features and statistical measures
title_sort	arabic nested noun compound extraction based on linguistic features and statistical measures
url	http://journalarticle.ukm.my/13773/ http://journalarticle.ukm.my/13773/ http://journalarticle.ukm.my/13773/1/25313-76332-1-PB.pdf

Arabic nested noun compound extraction based on linguistic features and statistical measures

Similar Items