Linguistically Enhanced Collocate Words Model

Bag-of-word (BOW) or fixed size window approach for word extraction in natural language text has ignored text structure and context information. Similarly, word co-occurrence based on linear word proximity has also ignored the linguistic criteria of words. This paper aims to propose a semantic windo...

Full description

Bibliographic Details
Main Authors: Siaw, Nyuk Hiong, Ranaivo-Malançon, Bali, Narayanan, Kulathuramaiyer, Jane, Labadin
Format: Book Chapter
Language:English
Published: Springer 2014
Subjects:
Online Access:http://ir.unimas.my/id/eprint/16386/
http://ir.unimas.my/id/eprint/16386/1/Linguistically%20Enhanced%20Collocate%20Words%20Model%20%28abstract%29.pdf
_version_ 1848838053972410368
author Siaw, Nyuk Hiong
Ranaivo-Malançon, Bali
Narayanan, Kulathuramaiyer
Jane, Labadin
author_facet Siaw, Nyuk Hiong
Ranaivo-Malançon, Bali
Narayanan, Kulathuramaiyer
Jane, Labadin
author_sort Siaw, Nyuk Hiong
building UNIMAS Institutional Repository
collection Online Access
description Bag-of-word (BOW) or fixed size window approach for word extraction in natural language text has ignored text structure and context information. Similarly, word co-occurrence based on linear word proximity has also ignored the linguistic criteria of words. This paper aims to propose a semantic window of word to address the needs to provide a context for capturing the structure and context of word in a sentence for analysis. The semantic window of word has linguistic elements which can be injected for collocate word identification. Selected data has been used as case studies. Quantitative analysis has been conducted as well. The proposed approach is evaluated and compared to sliding window which is the baseline. Semantic window is found to perform better than sliding window for linguistically enhanced collocate word extraction.
first_indexed 2025-11-15T06:49:27Z
format Book Chapter
id unimas-16386
institution Universiti Malaysia Sarawak
institution_category Local University
language English
last_indexed 2025-11-15T06:49:27Z
publishDate 2014
publisher Springer
recordtype eprints
repository_type Digital Repository
spelling unimas-163862017-05-23T07:03:14Z http://ir.unimas.my/id/eprint/16386/ Linguistically Enhanced Collocate Words Model Siaw, Nyuk Hiong Ranaivo-Malançon, Bali Narayanan, Kulathuramaiyer Jane, Labadin T Technology (General) Bag-of-word (BOW) or fixed size window approach for word extraction in natural language text has ignored text structure and context information. Similarly, word co-occurrence based on linear word proximity has also ignored the linguistic criteria of words. This paper aims to propose a semantic window of word to address the needs to provide a context for capturing the structure and context of word in a sentence for analysis. The semantic window of word has linguistic elements which can be injected for collocate word identification. Selected data has been used as case studies. Quantitative analysis has been conducted as well. The proposed approach is evaluated and compared to sliding window which is the baseline. Semantic window is found to perform better than sliding window for linguistically enhanced collocate word extraction. Springer 2014 Book Chapter PeerReviewed text en http://ir.unimas.my/id/eprint/16386/1/Linguistically%20Enhanced%20Collocate%20Words%20Model%20%28abstract%29.pdf Siaw, Nyuk Hiong and Ranaivo-Malançon, Bali and Narayanan, Kulathuramaiyer and Jane, Labadin (2014) Linguistically Enhanced Collocate Words Model. In: Information Retrieval Technology. Lecture Notes in Computer Science (8870). Springer, pp. 230-243. ISBN 978-3-319-12843-6 https://link.springer.com/chapter/10.1007/978-3-319-12844-3_20 DOI: 10.1007/978-3-319-12844-3_20
spellingShingle T Technology (General)
Siaw, Nyuk Hiong
Ranaivo-Malançon, Bali
Narayanan, Kulathuramaiyer
Jane, Labadin
Linguistically Enhanced Collocate Words Model
title Linguistically Enhanced Collocate Words Model
title_full Linguistically Enhanced Collocate Words Model
title_fullStr Linguistically Enhanced Collocate Words Model
title_full_unstemmed Linguistically Enhanced Collocate Words Model
title_short Linguistically Enhanced Collocate Words Model
title_sort linguistically enhanced collocate words model
topic T Technology (General)
url http://ir.unimas.my/id/eprint/16386/
http://ir.unimas.my/id/eprint/16386/
http://ir.unimas.my/id/eprint/16386/
http://ir.unimas.my/id/eprint/16386/1/Linguistically%20Enhanced%20Collocate%20Words%20Model%20%28abstract%29.pdf