Prediction of high cost performance metrics in information retrieval evaluation / Muwanei Sinyinda
There has been extensive use of the test collections to evaluate the effectiveness of information retrieval systems in laboratory-based evaluation experimentation. A typical test collection comprises a corpus of documents, topics, and relevance judgments generated by human assessors. A long-stand...
| Main Author: | |
|---|---|
| Format: | Thesis |
| Published: |
2022
|
| Subjects: | |
| Online Access: | http://studentsrepo.um.edu.my/14576/ http://studentsrepo.um.edu.my/14576/1/Muwanei.pdf http://studentsrepo.um.edu.my/14576/2/Muwanei_Sinyinda.pdf |
| Summary: | There has been extensive use of the test collections to evaluate the effectiveness of
information retrieval systems in laboratory-based evaluation experimentation. A typical
test collection comprises a corpus of documents, topics, and relevance judgments generated
by human assessors. A long-standing problem has been how to reduce the cost of
performing information retrieval evaluations. Therefore, in the last few decades, several
methods have been proposed to reduce the evaluation costs. Recent research has proposed to
reduce the evaluation costs by predicting performance metrics at the high evaluation depths
of documents using other performance metrics computed at the low evaluation depths. In
the above-mentioned research, the performance metrics computed or predicted at the high
evaluation depths of documents were also referred to as high-cost performance metrics.
By predicting the high-cost performance metrics, the usage of the relevance judgments is
restricted only to the computation of the performance metrics at the low evaluation depths.
However, this recent research reported low predictions of the normalized-cumulative
discounted gain and precision high-cost performance metrics while using the low-cost
performance metrics computed at the evaluation depths of up to 30 documents. Therefore,
this thesis makes several contributions and focuses on the predictions of the high- cost
normalized-cumulative discounted gain and precision performance metrics while using
other performance metrics computed at the low evaluation depths of up to 30 documents.
First, in every test collection, there are topics with varying levels of difficulty. Therefore,
this research has investigated the effect of the difficulty of topics on the predictions of
the high-cost performance metrics and has shown that more difficult topics have higher predictions of the high-cost performance metrics. Therefore, this research suggests that this
identified trend could be exploited in the methods for predicting the high-cost performance
metrics. Also, what was clear was the evidence of the presence of extreme scores of the
performance metrics that this research suggests should be resolved for improved predictions
of the high-cost performance metrics.
The second contribution concerns the exploration of the predictability of the performance
metrics in information retrieval evaluation. In recent research, machine learning models
were trained using performance metrics computed from a set of test collections, while
predictions were made on performance metrics from completely different sets of test collections.
Therefore, this research also explored how predictable the high-cost performance
metrics are that relate to particular test collections given that the machine learning models
were trained using performance metrics computed from other test collections. Hence, this
research has shown that exists a data set shift in the topic scores of performance metrics of
different test collections and therefore suggests addressing this data set shift for predictions
of the high-cost performance metrics.
The last contribution is the proposal of two methods that predict the normalized-cumulative
discounted gain and precision high-cost performance metrics using the low-cost performance
metrics computed at the evaluation depths of up to 30 documents. This research
has shown that the proposed methods provide better predictions than existing research.
|
|---|