LSI-based semantic characterisation for automated text categorisation

As knowledge acquisition remains a bottleneck, incorporating human judgement within intelligent systems is still a challenge. Supervised learning methods have shown to be able to assist humans in automated text categorization (ATC). However, the performance of such systems is largely dependent on th...

Full description

Bibliographic Details
Main Author: Tan, Ping Ping
Format: Thesis
Language:English
Published: Faculty of Computer Science and Information Technology 2009
Subjects:
Online Access:http://ir.unimas.my/id/eprint/167/
http://ir.unimas.my/id/eprint/167/8/LSI-based%20semantic%20characterization%20for%20automated%20text%20categorization%20%28fulltext%29.pdf
_version_ 1848834490166673408
author Tan, Ping Ping
author_facet Tan, Ping Ping
author_sort Tan, Ping Ping
building UNIMAS Institutional Repository
collection Online Access
description As knowledge acquisition remains a bottleneck, incorporating human judgement within intelligent systems is still a challenge. Supervised learning methods have shown to be able to assist humans in automated text categorization (ATC). However, the performance of such systems is largely dependent on the characteristics of the datasets. Without the understanding of why a classifier works well for certain datasets, it is difficult to generalise its application across domains. Furthermore, most training sets used in supervised ATC have category labels provided by human experts. Expert knowledge used in the task of categorization is often not captured via the mere process of manipulating category labels. This has resulted in lose of intended meanings while performing supervised ATC. Besides that, large text datasets often contain a greater deal of noise.
first_indexed 2025-11-15T05:52:48Z
format Thesis
id unimas-167
institution Universiti Malaysia Sarawak
institution_category Local University
language English
last_indexed 2025-11-15T05:52:48Z
publishDate 2009
publisher Faculty of Computer Science and Information Technology
recordtype eprints
repository_type Digital Repository
spelling unimas-1672023-05-08T07:37:46Z http://ir.unimas.my/id/eprint/167/ LSI-based semantic characterisation for automated text categorisation Tan, Ping Ping QA76 Computer software As knowledge acquisition remains a bottleneck, incorporating human judgement within intelligent systems is still a challenge. Supervised learning methods have shown to be able to assist humans in automated text categorization (ATC). However, the performance of such systems is largely dependent on the characteristics of the datasets. Without the understanding of why a classifier works well for certain datasets, it is difficult to generalise its application across domains. Furthermore, most training sets used in supervised ATC have category labels provided by human experts. Expert knowledge used in the task of categorization is often not captured via the mere process of manipulating category labels. This has resulted in lose of intended meanings while performing supervised ATC. Besides that, large text datasets often contain a greater deal of noise. Faculty of Computer Science and Information Technology 2009 Thesis NonPeerReviewed text en http://ir.unimas.my/id/eprint/167/8/LSI-based%20semantic%20characterization%20for%20automated%20text%20categorization%20%28fulltext%29.pdf Tan, Ping Ping (2009) LSI-based semantic characterisation for automated text categorisation. Masters thesis, Universiti Malaysia Sarawak.
spellingShingle QA76 Computer software
Tan, Ping Ping
LSI-based semantic characterisation for automated text categorisation
title LSI-based semantic characterisation for automated text categorisation
title_full LSI-based semantic characterisation for automated text categorisation
title_fullStr LSI-based semantic characterisation for automated text categorisation
title_full_unstemmed LSI-based semantic characterisation for automated text categorisation
title_short LSI-based semantic characterisation for automated text categorisation
title_sort lsi-based semantic characterisation for automated text categorisation
topic QA76 Computer software
url http://ir.unimas.my/id/eprint/167/
http://ir.unimas.my/id/eprint/167/8/LSI-based%20semantic%20characterization%20for%20automated%20text%20categorization%20%28fulltext%29.pdf