Meta-evaluation of online and offline web search evaluation metrics

As in most information retrieval (IR) studies, evaluation plays an essential part in Web search research. Both offline and online evaluation metrics are adopted in measuring the performance of search engines. Offline metrics are usually based on relevance judgments of query-document pairs from asses...

Full description

Bibliographic Details
Main Authors: Chen, Ye, Zhou, Ke, Liu, Yiqun, Zhang, Min, Ma, Shaoping
Format: Conference or Workshop Item
Published: 2017
Online Access:https://eprints.nottingham.ac.uk/45048/
_version_ 1848797056853868544
author Chen, Ye
Zhou, Ke
Liu, Yiqun
Zhang, Min
Ma, Shaoping
author_facet Chen, Ye
Zhou, Ke
Liu, Yiqun
Zhang, Min
Ma, Shaoping
author_sort Chen, Ye
building Nottingham Research Data Repository
collection Online Access
description As in most information retrieval (IR) studies, evaluation plays an essential part in Web search research. Both offline and online evaluation metrics are adopted in measuring the performance of search engines. Offline metrics are usually based on relevance judgments of query-document pairs from assessors while online metrics exploit the user behavior data, such as clicks, collected from search engines to compare search algorithms. Although both types of IR evaluation metrics have achieved success, to what extent can they predict user satisfaction still remains under-investigated. To shed light on this research question, we meta-evaluate a series of existing online and offline metrics to study how well they infer actual search user satisfaction in different search scenarios. We find that both types of evaluation metrics significantly correlate with user satisfaction while they reflect satisfaction from different perspectives for different search tasks. Offline metrics better align with user satisfaction in homogeneous search (i.e. ten blue links) whereas online metrics outperform when vertical results are federated. Finally, we also propose to incorporate mouse hover information into existing online evaluation metrics, and empirically show that they better align with search user satisfaction than click-based online metrics.
first_indexed 2025-11-14T19:57:49Z
format Conference or Workshop Item
id nottingham-45048
institution University of Nottingham Malaysia Campus
institution_category Local University
last_indexed 2025-11-14T19:57:49Z
publishDate 2017
recordtype eprints
repository_type Digital Repository
spelling nottingham-450482020-05-04T18:59:42Z https://eprints.nottingham.ac.uk/45048/ Meta-evaluation of online and offline web search evaluation metrics Chen, Ye Zhou, Ke Liu, Yiqun Zhang, Min Ma, Shaoping As in most information retrieval (IR) studies, evaluation plays an essential part in Web search research. Both offline and online evaluation metrics are adopted in measuring the performance of search engines. Offline metrics are usually based on relevance judgments of query-document pairs from assessors while online metrics exploit the user behavior data, such as clicks, collected from search engines to compare search algorithms. Although both types of IR evaluation metrics have achieved success, to what extent can they predict user satisfaction still remains under-investigated. To shed light on this research question, we meta-evaluate a series of existing online and offline metrics to study how well they infer actual search user satisfaction in different search scenarios. We find that both types of evaluation metrics significantly correlate with user satisfaction while they reflect satisfaction from different perspectives for different search tasks. Offline metrics better align with user satisfaction in homogeneous search (i.e. ten blue links) whereas online metrics outperform when vertical results are federated. Finally, we also propose to incorporate mouse hover information into existing online evaluation metrics, and empirically show that they better align with search user satisfaction than click-based online metrics. 2017-08-07 Conference or Workshop Item PeerReviewed Chen, Ye, Zhou, Ke, Liu, Yiqun, Zhang, Min and Ma, Shaoping (2017) Meta-evaluation of online and offline web search evaluation metrics. In: SIGIR '17: 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 7-11 August 2017, Shinjuku, Tokyo, Japan. https://doi.org/10.1145/3077136.3080804 10.1145/3077136.3080804 10.1145/3077136.3080804 10.1145/3077136.3080804
spellingShingle Chen, Ye
Zhou, Ke
Liu, Yiqun
Zhang, Min
Ma, Shaoping
Meta-evaluation of online and offline web search evaluation metrics
title Meta-evaluation of online and offline web search evaluation metrics
title_full Meta-evaluation of online and offline web search evaluation metrics
title_fullStr Meta-evaluation of online and offline web search evaluation metrics
title_full_unstemmed Meta-evaluation of online and offline web search evaluation metrics
title_short Meta-evaluation of online and offline web search evaluation metrics
title_sort meta-evaluation of online and offline web search evaluation metrics
url https://eprints.nottingham.ac.uk/45048/
https://eprints.nottingham.ac.uk/45048/
https://eprints.nottingham.ac.uk/45048/