Effects of cognitive abilities on reliability of crowdsourced relevance judgments in information retrieval evaluation / Parnia Samimi

Test collection is extensively used to evaluate information retrieval systems in laboratory-based evaluation experimentation. In a classic setting of a test collection, human assessors involve relevance judgments which is costly and time-consuming task while scales poorly. Researchers are still bein...

Full description

Bibliographic Details
Main Author: Parnia , Samimi
Format: Thesis
Published: 2016
Subjects:
Online Access:http://studentsrepo.um.edu.my/9299/
http://studentsrepo.um.edu.my/9299/1/Parnia_Samimi.pdf
http://studentsrepo.um.edu.my/9299/6/parnia.pdf
_version_ 1848773887208194048
author Parnia , Samimi
author_facet Parnia , Samimi
author_sort Parnia , Samimi
building UM Research Repository
collection Online Access
description Test collection is extensively used to evaluate information retrieval systems in laboratory-based evaluation experimentation. In a classic setting of a test collection, human assessors involve relevance judgments which is costly and time-consuming task while scales poorly. Researchers are still being challenged in performing reliable and low-cost evaluation of information retrieval systems. Crowdsourcing as a novel method of data acquisition provides a cost effective and relatively quick solution for creating relevance judgments. Crowdsourcing by its nature has a high level of heterogeneity in potential workers to perform relevance judgments, which in turn cause heterogeneity in accuracy. Therefore, the main concern for using crowdsourcing as a replacement for human expert assessors is whether crowdsourcing is reliable in creating relevance judgments. It is an important concern, which needs to identify factors that affect the reliability of crowdsourced relevance judgments. The main goal of this study is to measure various cognitive characteristics of crowdsourced workers, and to explore the effect(s) that these characteristics have upon judgment reliability, as measured against a human assessment (as the gold standard). As such, the reliability of the workers is compared to that of an expert assessor, both directly as the overlap between relevance assessments, and indirectly by comparing the system effectiveness evaluation arrived at from expert and from worker assessors. In this study, we assess the effects of the three different cognitive abilities namely verbal comprehension skill, general reasoning skill and logical reasoning skill on reliability of relevance judgment in three experiments. Furthermore, workers provided some information about their knowledge about the topics, their confidence in performing given tasks, the perceived tasks’ difficulty, as well as their demographics. This information is to investigate the effect of various factors on the reliability of relevance judgments. In this work, we hypothesized that workers with higher cognitive abilities can outperform the workers with lower level of cognitive abilities in providing reliable relevance judgments in crowdsourcing. All of the three experiments show that individual differences in verbal comprehension skill, as well as general reasoning skill and logical reasoning skill are associated with reliability of relevance judgments, which leaded us to propose two approaches. These approaches are to improve the reliability of relevance judgments. Filtering approach suggests recruiting workers with certain level(s) of cognitive abilities for relevance judgment task. Judgment aggregation approach incorporates scores of cognitive abilities into aggregation process. These approaches improves the reliability of relevance judgments while have a small effect on system rankings. Self-reported difficulty of a judgment and the level of confidence in performing a given task have significant correlations with reliability of judgments. Unexpectedly though, self-reported knowledge about a given topic and demographics data have no correlation with the reliability of judgments. This study contributes to the information retrieval evaluation experimental methodology by addressing the issues faced by those researchers who use test collections for information retrieval system evaluation. This research emphasizes the importance of the cognitive characteristics of crowdsourcing workers as important factors in performing relevance judgment tasks.
first_indexed 2025-11-14T13:49:33Z
format Thesis
id um-9299
institution University Malaya
institution_category Local University
last_indexed 2025-11-14T13:49:33Z
publishDate 2016
recordtype eprints
repository_type Digital Repository
spelling um-92992019-09-10T00:07:21Z Effects of cognitive abilities on reliability of crowdsourced relevance judgments in information retrieval evaluation / Parnia Samimi Parnia , Samimi QA75 Electronic computers. Computer science Test collection is extensively used to evaluate information retrieval systems in laboratory-based evaluation experimentation. In a classic setting of a test collection, human assessors involve relevance judgments which is costly and time-consuming task while scales poorly. Researchers are still being challenged in performing reliable and low-cost evaluation of information retrieval systems. Crowdsourcing as a novel method of data acquisition provides a cost effective and relatively quick solution for creating relevance judgments. Crowdsourcing by its nature has a high level of heterogeneity in potential workers to perform relevance judgments, which in turn cause heterogeneity in accuracy. Therefore, the main concern for using crowdsourcing as a replacement for human expert assessors is whether crowdsourcing is reliable in creating relevance judgments. It is an important concern, which needs to identify factors that affect the reliability of crowdsourced relevance judgments. The main goal of this study is to measure various cognitive characteristics of crowdsourced workers, and to explore the effect(s) that these characteristics have upon judgment reliability, as measured against a human assessment (as the gold standard). As such, the reliability of the workers is compared to that of an expert assessor, both directly as the overlap between relevance assessments, and indirectly by comparing the system effectiveness evaluation arrived at from expert and from worker assessors. In this study, we assess the effects of the three different cognitive abilities namely verbal comprehension skill, general reasoning skill and logical reasoning skill on reliability of relevance judgment in three experiments. Furthermore, workers provided some information about their knowledge about the topics, their confidence in performing given tasks, the perceived tasks’ difficulty, as well as their demographics. This information is to investigate the effect of various factors on the reliability of relevance judgments. In this work, we hypothesized that workers with higher cognitive abilities can outperform the workers with lower level of cognitive abilities in providing reliable relevance judgments in crowdsourcing. All of the three experiments show that individual differences in verbal comprehension skill, as well as general reasoning skill and logical reasoning skill are associated with reliability of relevance judgments, which leaded us to propose two approaches. These approaches are to improve the reliability of relevance judgments. Filtering approach suggests recruiting workers with certain level(s) of cognitive abilities for relevance judgment task. Judgment aggregation approach incorporates scores of cognitive abilities into aggregation process. These approaches improves the reliability of relevance judgments while have a small effect on system rankings. Self-reported difficulty of a judgment and the level of confidence in performing a given task have significant correlations with reliability of judgments. Unexpectedly though, self-reported knowledge about a given topic and demographics data have no correlation with the reliability of judgments. This study contributes to the information retrieval evaluation experimental methodology by addressing the issues faced by those researchers who use test collections for information retrieval system evaluation. This research emphasizes the importance of the cognitive characteristics of crowdsourcing workers as important factors in performing relevance judgment tasks. 2016-09 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/9299/1/Parnia_Samimi.pdf application/pdf http://studentsrepo.um.edu.my/9299/6/parnia.pdf Parnia , Samimi (2016) Effects of cognitive abilities on reliability of crowdsourced relevance judgments in information retrieval evaluation / Parnia Samimi. PhD thesis, University of Malaya. http://studentsrepo.um.edu.my/9299/
spellingShingle QA75 Electronic computers. Computer science
Parnia , Samimi
Effects of cognitive abilities on reliability of crowdsourced relevance judgments in information retrieval evaluation / Parnia Samimi
title Effects of cognitive abilities on reliability of crowdsourced relevance judgments in information retrieval evaluation / Parnia Samimi
title_full Effects of cognitive abilities on reliability of crowdsourced relevance judgments in information retrieval evaluation / Parnia Samimi
title_fullStr Effects of cognitive abilities on reliability of crowdsourced relevance judgments in information retrieval evaluation / Parnia Samimi
title_full_unstemmed Effects of cognitive abilities on reliability of crowdsourced relevance judgments in information retrieval evaluation / Parnia Samimi
title_short Effects of cognitive abilities on reliability of crowdsourced relevance judgments in information retrieval evaluation / Parnia Samimi
title_sort effects of cognitive abilities on reliability of crowdsourced relevance judgments in information retrieval evaluation / parnia samimi
topic QA75 Electronic computers. Computer science
url http://studentsrepo.um.edu.my/9299/
http://studentsrepo.um.edu.my/9299/1/Parnia_Samimi.pdf
http://studentsrepo.um.edu.my/9299/6/parnia.pdf