A machine learning-based approach for automated quality assessment of user generated content in web forums

Web 2.0 platforms such as forums, blogs and wikis allow users from its community to contribute content. However, users often received little if any professional training in content creation and content is commonly published without peer review. Excessive low quality user contributions can lead to in...

Full description

Bibliographic Details
Main Author: Chai, Kevin Eng Kwong
Format: Thesis
Language:English
Published: Curtin University 2011
Subjects:
Online Access:http://hdl.handle.net/20.500.11937/107
_version_ 1848743274516316160
author Chai, Kevin Eng Kwong
author_facet Chai, Kevin Eng Kwong
author_sort Chai, Kevin Eng Kwong
building Curtin Institutional Repository
collection Online Access
description Web 2.0 platforms such as forums, blogs and wikis allow users from its community to contribute content. However, users often received little if any professional training in content creation and content is commonly published without peer review. Excessive low quality user contributions can lead to information overload, which describes the situation when a user feels overwhelmed with unwanted information. Information overload can cause users to withdraw from using a website therefore decreasing a website's overall sustainability through the loss of users from its community.Many Web 2.0 websites have relied on its users to manually rate the quality of User Generated Content (UGC) to deal with this problem. However, the major problems with this approach is that rating is voluntary so a large percentage of content often receives a lack of rating and UGC is often created at a faster rate than which it can be sufficiently rated. Therefore, automated content quality assessment models are required to address the problems caused by manual user rating.A number of automated models have been proposed in recent years for Web 2.0 platforms. However, we identified many limitations with these existing models in our literature review. For example, the majority of models are only suitable for a specific language such as English and have not effectively considered how content is used by the user community in the assessment process. Therefore, we propose a novel and language independent model that evaluates content, usage, reputation, temporal and structural dimensions of UGC for quality assessment to address these limitations..We developed our model using Web technologies and a supervised machine learning approach. More specifically, we employed a rule learner, a fuzzy logic classifier and Support Vector Machines. We validated our model on three operational Web forums and outperformed existing models in the literature in our experiments. We used the Friedman Test and Nemenyi test to verify our results and discovered that the performance improvements generated by our model are statistically significant over the existing models.
first_indexed 2025-11-14T05:42:58Z
format Thesis
id curtin-20.500.11937-107
institution Curtin University Malaysia
institution_category Local University
language English
last_indexed 2025-11-14T05:42:58Z
publishDate 2011
publisher Curtin University
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-1072017-02-20T06:40:16Z A machine learning-based approach for automated quality assessment of user generated content in web forums Chai, Kevin Eng Kwong user generated content automated quality assessment machine learning-based approach web forums Web 2.0 platforms such as forums, blogs and wikis allow users from its community to contribute content. However, users often received little if any professional training in content creation and content is commonly published without peer review. Excessive low quality user contributions can lead to information overload, which describes the situation when a user feels overwhelmed with unwanted information. Information overload can cause users to withdraw from using a website therefore decreasing a website's overall sustainability through the loss of users from its community.Many Web 2.0 websites have relied on its users to manually rate the quality of User Generated Content (UGC) to deal with this problem. However, the major problems with this approach is that rating is voluntary so a large percentage of content often receives a lack of rating and UGC is often created at a faster rate than which it can be sufficiently rated. Therefore, automated content quality assessment models are required to address the problems caused by manual user rating.A number of automated models have been proposed in recent years for Web 2.0 platforms. However, we identified many limitations with these existing models in our literature review. For example, the majority of models are only suitable for a specific language such as English and have not effectively considered how content is used by the user community in the assessment process. Therefore, we propose a novel and language independent model that evaluates content, usage, reputation, temporal and structural dimensions of UGC for quality assessment to address these limitations..We developed our model using Web technologies and a supervised machine learning approach. More specifically, we employed a rule learner, a fuzzy logic classifier and Support Vector Machines. We validated our model on three operational Web forums and outperformed existing models in the literature in our experiments. We used the Friedman Test and Nemenyi test to verify our results and discovered that the performance improvements generated by our model are statistically significant over the existing models. 2011 Thesis http://hdl.handle.net/20.500.11937/107 en Curtin University fulltext
spellingShingle user generated content
automated quality assessment
machine learning-based approach
web forums
Chai, Kevin Eng Kwong
A machine learning-based approach for automated quality assessment of user generated content in web forums
title A machine learning-based approach for automated quality assessment of user generated content in web forums
title_full A machine learning-based approach for automated quality assessment of user generated content in web forums
title_fullStr A machine learning-based approach for automated quality assessment of user generated content in web forums
title_full_unstemmed A machine learning-based approach for automated quality assessment of user generated content in web forums
title_short A machine learning-based approach for automated quality assessment of user generated content in web forums
title_sort machine learning-based approach for automated quality assessment of user generated content in web forums
topic user generated content
automated quality assessment
machine learning-based approach
web forums
url http://hdl.handle.net/20.500.11937/107