Effects of cognitive abilities on reliability of crowdsourced relevance judgments in information retrieval evaluation / Parnia Samimi

Test collection is extensively used to evaluate information retrieval systems in laboratory-based evaluation experimentation. In a classic setting of a test collection, human assessors involve relevance judgments which is costly and time-consuming task while scales poorly. Researchers are still bein...

Full description

Bibliographic Details
Main Author:	Parnia , Samimi
Format:	Thesis
Published:	2016
Subjects:	QA75 Electronic computers. Computer science
Online Access:	http://studentsrepo.um.edu.my/9299/ http://studentsrepo.um.edu.my/9299/1/Parnia_Samimi.pdf http://studentsrepo.um.edu.my/9299/6/parnia.pdf

_version_	1848773887208194048
author	Parnia , Samimi
author_facet	Parnia , Samimi
author_sort	Parnia , Samimi
building	UM Research Repository
collection	Online Access
description	Test collection is extensively used to evaluate information retrieval systems in laboratory-based evaluation experimentation. In a classic setting of a test collection, human assessors involve relevance judgments which is costly and time-consuming task while scales poorly. Researchers are still being challenged in performing reliable and low-cost evaluation of information retrieval systems. Crowdsourcing as a novel method of data acquisition provides a cost effective and relatively quick solution for creating relevance judgments. Crowdsourcing by its nature has a high level of heterogeneity in potential workers to perform relevance judgments, which in turn cause heterogeneity in accuracy. Therefore, the main concern for using crowdsourcing as a replacement for human expert assessors is whether crowdsourcing is reliable in creating relevance judgments. It is an important concern, which needs to identify factors that affect the reliability of crowdsourced relevance judgments. The main goal of this study is to measure various cognitive characteristics of crowdsourced workers, and to explore the effect(s) that these characteristics have upon judgment reliability, as measured against a human assessment (as the gold standard). As such, the reliability of the workers is compared to that of an expert assessor, both directly as the overlap between relevance assessments, and indirectly by comparing the system effectiveness evaluation arrived at from expert and from worker assessors. In this study, we assess the effects of the three different cognitive abilities namely verbal comprehension skill, general reasoning skill and logical reasoning skill on reliability of relevance judgment in three experiments. Furthermore, workers provided some information about their knowledge about the topics, their confidence in performing given tasks, the perceived tasks’ difficulty, as well as their demographics. This information is to investigate the effect of various factors on the reliability of relevance judgments. In this work, we hypothesized that workers with higher cognitive abilities can outperform the workers with lower level of cognitive abilities in providing reliable relevance judgments in crowdsourcing. All of the three experiments show that individual differences in verbal comprehension skill, as well as general reasoning skill and logical reasoning skill are associated with reliability of relevance judgments, which leaded us to propose two approaches. These approaches are to improve the reliability of relevance judgments. Filtering approach suggests recruiting workers with certain level(s) of cognitive abilities for relevance judgment task. Judgment aggregation approach incorporates scores of cognitive abilities into aggregation process. These approaches improves the reliability of relevance judgments while have a small effect on system rankings. Self-reported difficulty of a judgment and the level of confidence in performing a given task have significant correlations with reliability of judgments. Unexpectedly though, self-reported knowledge about a given topic and demographics data have no correlation with the reliability of judgments. This study contributes to the information retrieval evaluation experimental methodology by addressing the issues faced by those researchers who use test collections for information retrieval system evaluation. This research emphasizes the importance of the cognitive characteristics of crowdsourcing workers as important factors in performing relevance judgment tasks.
first_indexed	2025-11-14T13:49:33Z
format	Thesis
id	um-9299
institution	University Malaya
institution_category	Local University
last_indexed	2025-11-14T13:49:33Z
publishDate	2016
recordtype	eprints
repository_type	Digital Repository
spelling	um-92992019-09-10T00:07:21Z Effects of cognitive abilities on reliability of crowdsourced relevance judgments in information retrieval evaluation / Parnia Samimi Parnia , Samimi QA75 Electronic computers. Computer science Test collection is extensively used to evaluate information retrieval systems in laboratory-based evaluation experimentation. In a classic setting of a test collection, human assessors involve relevance judgments which is costly and time-consuming task while scales poorly. Researchers are still being challenged in performing reliable and low-cost evaluation of information retrieval systems. Crowdsourcing as a novel method of data acquisition provides a cost effective and relatively quick solution for creating relevance judgments. Crowdsourcing by its nature has a high level of heterogeneity in potential workers to perform relevance judgments, which in turn cause heterogeneity in accuracy. Therefore, the main concern for using crowdsourcing as a replacement for human expert assessors is whether crowdsourcing is reliable in creating relevance judgments. It is an important concern, which needs to identify factors that affect the reliability of crowdsourced relevance judgments. The main goal of this study is to measure various cognitive characteristics of crowdsourced workers, and to explore the effect(s) that these characteristics have upon judgment reliability, as measured against a human assessment (as the gold standard). As such, the reliability of the workers is compared to that of an expert assessor, both directly as the overlap between relevance assessments, and indirectly by comparing the system effectiveness evaluation arrived at from expert and from worker assessors. In this study, we assess the effects of the three different cognitive abilities namely verbal comprehension skill, general reasoning skill and logical reasoning skill on reliability of relevance judgment in three experiments. Furthermore, workers provided some information about their knowledge about the topics, their confidence in performing given tasks, the perceived tasks’ difficulty, as well as their demographics. This information is to investigate the effect of various factors on the reliability of relevance judgments. In this work, we hypothesized that workers with higher cognitive abilities can outperform the workers with lower level of cognitive abilities in providing reliable relevance judgments in crowdsourcing. All of the three experiments show that individual differences in verbal comprehension skill, as well as general reasoning skill and logical reasoning skill are associated with reliability of relevance judgments, which leaded us to propose two approaches. These approaches are to improve the reliability of relevance judgments. Filtering approach suggests recruiting workers with certain level(s) of cognitive abilities for relevance judgment task. Judgment aggregation approach incorporates scores of cognitive abilities into aggregation process. These approaches improves the reliability of relevance judgments while have a small effect on system rankings. Self-reported difficulty of a judgment and the level of confidence in performing a given task have significant correlations with reliability of judgments. Unexpectedly though, self-reported knowledge about a given topic and demographics data have no correlation with the reliability of judgments. This study contributes to the information retrieval evaluation experimental methodology by addressing the issues faced by those researchers who use test collections for information retrieval system evaluation. This research emphasizes the importance of the cognitive characteristics of crowdsourcing workers as important factors in performing relevance judgment tasks. 2016-09 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/9299/1/Parnia_Samimi.pdf application/pdf http://studentsrepo.um.edu.my/9299/6/parnia.pdf Parnia , Samimi (2016) Effects of cognitive abilities on reliability of crowdsourced relevance judgments in information retrieval evaluation / Parnia Samimi. PhD thesis, University of Malaya. http://studentsrepo.um.edu.my/9299/
spellingShingle	QA75 Electronic computers. Computer science Parnia , Samimi Effects of cognitive abilities on reliability of crowdsourced relevance judgments in information retrieval evaluation / Parnia Samimi
title	Effects of cognitive abilities on reliability of crowdsourced relevance judgments in information retrieval evaluation / Parnia Samimi
title_full	Effects of cognitive abilities on reliability of crowdsourced relevance judgments in information retrieval evaluation / Parnia Samimi
title_fullStr	Effects of cognitive abilities on reliability of crowdsourced relevance judgments in information retrieval evaluation / Parnia Samimi
title_full_unstemmed	Effects of cognitive abilities on reliability of crowdsourced relevance judgments in information retrieval evaluation / Parnia Samimi
title_short	Effects of cognitive abilities on reliability of crowdsourced relevance judgments in information retrieval evaluation / Parnia Samimi
title_sort	effects of cognitive abilities on reliability of crowdsourced relevance judgments in information retrieval evaluation / parnia samimi
topic	QA75 Electronic computers. Computer science
url	http://studentsrepo.um.edu.my/9299/ http://studentsrepo.um.edu.my/9299/1/Parnia_Samimi.pdf http://studentsrepo.um.edu.my/9299/6/parnia.pdf

Effects of cognitive abilities on reliability of crowdsourced relevance judgments in information retrieval evaluation / Parnia Samimi

Similar Items