Meta-evaluation of online and offline web search evaluation metrics
As in most information retrieval (IR) studies, evaluation plays an essential part in Web search research. Both offline and online evaluation metrics are adopted in measuring the performance of search engines. Offline metrics are usually based on relevance judgments of query-document pairs from asses...
| Main Authors: | , , , , |
|---|---|
| Format: | Conference or Workshop Item |
| Published: |
2017
|
| Online Access: | https://eprints.nottingham.ac.uk/45048/ |
| _version_ | 1848797056853868544 |
|---|---|
| author | Chen, Ye Zhou, Ke Liu, Yiqun Zhang, Min Ma, Shaoping |
| author_facet | Chen, Ye Zhou, Ke Liu, Yiqun Zhang, Min Ma, Shaoping |
| author_sort | Chen, Ye |
| building | Nottingham Research Data Repository |
| collection | Online Access |
| description | As in most information retrieval (IR) studies, evaluation plays an essential part in Web search research. Both offline and online evaluation metrics are adopted in measuring the performance of search engines. Offline metrics are usually based on relevance judgments of query-document pairs from assessors while online metrics exploit the user behavior data, such as clicks, collected from search engines to compare search algorithms. Although both types of IR evaluation metrics have achieved success, to what extent can they predict user satisfaction still remains under-investigated. To shed light on this research question, we meta-evaluate a series of existing online and offline metrics to study how well they infer actual search user satisfaction in different search scenarios. We find that both types of evaluation metrics significantly correlate with user satisfaction while they reflect satisfaction from different perspectives for different search tasks. Offline metrics better align with user satisfaction in homogeneous search (i.e. ten blue links) whereas online metrics outperform when vertical results are federated. Finally, we also propose to incorporate mouse hover information into existing online evaluation metrics, and empirically show that they better align with search user satisfaction than click-based online metrics. |
| first_indexed | 2025-11-14T19:57:49Z |
| format | Conference or Workshop Item |
| id | nottingham-45048 |
| institution | University of Nottingham Malaysia Campus |
| institution_category | Local University |
| last_indexed | 2025-11-14T19:57:49Z |
| publishDate | 2017 |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | nottingham-450482020-05-04T18:59:42Z https://eprints.nottingham.ac.uk/45048/ Meta-evaluation of online and offline web search evaluation metrics Chen, Ye Zhou, Ke Liu, Yiqun Zhang, Min Ma, Shaoping As in most information retrieval (IR) studies, evaluation plays an essential part in Web search research. Both offline and online evaluation metrics are adopted in measuring the performance of search engines. Offline metrics are usually based on relevance judgments of query-document pairs from assessors while online metrics exploit the user behavior data, such as clicks, collected from search engines to compare search algorithms. Although both types of IR evaluation metrics have achieved success, to what extent can they predict user satisfaction still remains under-investigated. To shed light on this research question, we meta-evaluate a series of existing online and offline metrics to study how well they infer actual search user satisfaction in different search scenarios. We find that both types of evaluation metrics significantly correlate with user satisfaction while they reflect satisfaction from different perspectives for different search tasks. Offline metrics better align with user satisfaction in homogeneous search (i.e. ten blue links) whereas online metrics outperform when vertical results are federated. Finally, we also propose to incorporate mouse hover information into existing online evaluation metrics, and empirically show that they better align with search user satisfaction than click-based online metrics. 2017-08-07 Conference or Workshop Item PeerReviewed Chen, Ye, Zhou, Ke, Liu, Yiqun, Zhang, Min and Ma, Shaoping (2017) Meta-evaluation of online and offline web search evaluation metrics. In: SIGIR '17: 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 7-11 August 2017, Shinjuku, Tokyo, Japan. https://doi.org/10.1145/3077136.3080804 10.1145/3077136.3080804 10.1145/3077136.3080804 10.1145/3077136.3080804 |
| spellingShingle | Chen, Ye Zhou, Ke Liu, Yiqun Zhang, Min Ma, Shaoping Meta-evaluation of online and offline web search evaluation metrics |
| title | Meta-evaluation of online and offline web search evaluation metrics |
| title_full | Meta-evaluation of online and offline web search evaluation metrics |
| title_fullStr | Meta-evaluation of online and offline web search evaluation metrics |
| title_full_unstemmed | Meta-evaluation of online and offline web search evaluation metrics |
| title_short | Meta-evaluation of online and offline web search evaluation metrics |
| title_sort | meta-evaluation of online and offline web search evaluation metrics |
| url | https://eprints.nottingham.ac.uk/45048/ https://eprints.nottingham.ac.uk/45048/ https://eprints.nottingham.ac.uk/45048/ |