Linguistically Enhanced Collocate Words Model

Bag-of-word (BOW) or fixed size window approach for word extraction in natural language text has ignored text structure and context information. Similarly, word co-occurrence based on linear word proximity has also ignored the linguistic criteria of words. This paper aims to propose a semantic windo...

Full description

Bibliographic Details
Main Authors: Siaw, Nyuk Hiong, Ranaivo-Malançon, Bali, Narayanan, Kulathuramaiyer, Jane, Labadin
Format: Book Chapter
Language:English
Published: Springer 2014
Subjects:
Online Access:http://ir.unimas.my/id/eprint/16386/
http://ir.unimas.my/id/eprint/16386/1/Linguistically%20Enhanced%20Collocate%20Words%20Model%20%28abstract%29.pdf
Description
Summary:Bag-of-word (BOW) or fixed size window approach for word extraction in natural language text has ignored text structure and context information. Similarly, word co-occurrence based on linear word proximity has also ignored the linguistic criteria of words. This paper aims to propose a semantic window of word to address the needs to provide a context for capturing the structure and context of word in a sentence for analysis. The semantic window of word has linguistic elements which can be injected for collocate word identification. Selected data has been used as case studies. Quantitative analysis has been conducted as well. The proposed approach is evaluated and compared to sliding window which is the baseline. Semantic window is found to perform better than sliding window for linguistically enhanced collocate word extraction.