Utilizing word matching for duplicate article removal : a study using Malaysian online news feed

Users of feed aggregators know that duplicated articles are found occasionally on the feeds they subscribe to. It can be time consuming to read all articles and stumble upon duplicated items they have already read. Our work here is to determine the effectiveness of using basic word matching to remov...

Full description

Bibliographic Details
Main Authors: Su, Tze-Wei, Khor, Hao-Ming, Tan, Ian K. T.
Format: Conference or Workshop Item
Language:English
Published: 2011
Subjects:
Online Access:http://eprints.sunway.edu.my/116/
http://eprints.sunway.edu.my/116/1/ICS2011_17.pdf
_version_ 1848801750506536960
author Su, Tze-Wei
Khor, Hao-Ming
Tan, Ian K. T.
author_facet Su, Tze-Wei
Khor, Hao-Ming
Tan, Ian K. T.
author_sort Su, Tze-Wei
building SU Institutional Repository
collection Online Access
description Users of feed aggregators know that duplicated articles are found occasionally on the feeds they subscribe to. It can be time consuming to read all articles and stumble upon duplicated items they have already read. Our work here is to determine the effectiveness of using basic word matching to remove duplicated items and only show the most relevant item, thus saving readers? time. The method described in this paper to remove duplicates involves word matching heuristics with an appropriate matching percentage. The duplicated feeds are then ranked to only display the highest ranked article. Ranking is done using the number of search items found on the titles of the news feeds where the highest number returned will be considered the highest ranked article. Using Malaysian online news feeds, our method found that with a matching percentage of 40%, the method will be able to minimize duplicates
first_indexed 2025-11-14T21:12:25Z
format Conference or Workshop Item
id sunway-116
institution Sunway University
institution_category Local University
language English
last_indexed 2025-11-14T21:12:25Z
publishDate 2011
recordtype eprints
repository_type Digital Repository
spelling sunway-1162012-10-17T03:41:36Z http://eprints.sunway.edu.my/116/ Utilizing word matching for duplicate article removal : a study using Malaysian online news feed Su, Tze-Wei Khor, Hao-Ming Tan, Ian K. T. QA76 Computer software Users of feed aggregators know that duplicated articles are found occasionally on the feeds they subscribe to. It can be time consuming to read all articles and stumble upon duplicated items they have already read. Our work here is to determine the effectiveness of using basic word matching to remove duplicated items and only show the most relevant item, thus saving readers? time. The method described in this paper to remove duplicates involves word matching heuristics with an appropriate matching percentage. The duplicated feeds are then ranked to only display the highest ranked article. Ranking is done using the number of search items found on the titles of the news feeds where the highest number returned will be considered the highest ranked article. Using Malaysian online news feeds, our method found that with a matching percentage of 40%, the method will be able to minimize duplicates 2011-06 Conference or Workshop Item PeerReviewed text en http://eprints.sunway.edu.my/116/1/ICS2011_17.pdf Su, Tze-Wei and Khor, Hao-Ming and Tan, Ian K. T. (2011) Utilizing word matching for duplicate article removal : a study using Malaysian online news feed. In: Symposium on Information & Computer Sciences (1st).
spellingShingle QA76 Computer software
Su, Tze-Wei
Khor, Hao-Ming
Tan, Ian K. T.
Utilizing word matching for duplicate article removal : a study using Malaysian online news feed
title Utilizing word matching for duplicate article removal : a study using Malaysian online news feed
title_full Utilizing word matching for duplicate article removal : a study using Malaysian online news feed
title_fullStr Utilizing word matching for duplicate article removal : a study using Malaysian online news feed
title_full_unstemmed Utilizing word matching for duplicate article removal : a study using Malaysian online news feed
title_short Utilizing word matching for duplicate article removal : a study using Malaysian online news feed
title_sort utilizing word matching for duplicate article removal : a study using malaysian online news feed
topic QA76 Computer software
url http://eprints.sunway.edu.my/116/
http://eprints.sunway.edu.my/116/1/ICS2011_17.pdf