Automatically identifying coherent web sessions from browser logs

Due to the increasing diversity in both user’s behaviour and the types of web tasks performed, many studies in information retrieval (IR) are turning towards session-based retrieval rather than single URL-query pairs. However, extracting the meaningful session data from the raw discrete logs is stil...

Full description

Bibliographic Details
Main Author:	Ye, Chaoyu
Format:	Thesis (University of Nottingham only)
Language:	English
Published:	2019
Subjects:	web sessions browsers internet information retrieval
Online Access:	https://eprints.nottingham.ac.uk/59119/

_version_	1848799587881451520
author	Ye, Chaoyu
author_facet	Ye, Chaoyu
author_sort	Ye, Chaoyu
building	Nottingham Research Data Repository
collection	Online Access
description	Due to the increasing diversity in both user’s behaviour and the types of web tasks performed, many studies in information retrieval (IR) are turning towards session-based retrieval rather than single URL-query pairs. However, extracting the meaningful session data from the raw discrete logs is still a significant challenge. Most prior studies have been based on datasets where the logs of each user’s web history were simply divided by fixed periods of inactivity, such as 5, 15, or 30 minutes [52,31]. There have also been some attempts beyond these simplistic fixed timeouts [91]. Rather than covering all web activities, they focus on search-related activities only. Consequently, it is necessary to finding a meaningful way to cluster all activities including both searching and browsing on a web browser. The goal of this study is to find a way to better automatically segment users’ web activity into sessions. There are three research stages: 1) how people understand their mental model in the session segmentation, 2) how these self-identified sessions look in practically implemented weblogs, and 3) how we can algorithmically identify these sessions from browser activity, and how each algorithm performs. To answer these questions, firstly a qualitative study was conducted and a taxonomy of six factors related to the user-identified sessions was generated. Then a Chrome Extension was built that provided the practical reflection of user-identified sessions with comprehensive sets of web logs including both user interaction and visit details. This helped to gather a ground truth dataset to support further evaluation. Finally, several algorithmic approaches to automatically clustering web activities closer to user-identified sessions were evaluated.
first_indexed	2025-11-14T20:38:03Z
format	Thesis (University of Nottingham only)
id	nottingham-59119
institution	University of Nottingham Malaysia Campus
institution_category	Local University
language	English
last_indexed	2025-11-14T20:38:03Z
publishDate	2019
recordtype	eprints
repository_type	Digital Repository
spelling	nottingham-591192025-02-28T14:39:37Z https://eprints.nottingham.ac.uk/59119/ Automatically identifying coherent web sessions from browser logs Ye, Chaoyu Due to the increasing diversity in both user’s behaviour and the types of web tasks performed, many studies in information retrieval (IR) are turning towards session-based retrieval rather than single URL-query pairs. However, extracting the meaningful session data from the raw discrete logs is still a significant challenge. Most prior studies have been based on datasets where the logs of each user’s web history were simply divided by fixed periods of inactivity, such as 5, 15, or 30 minutes [52,31]. There have also been some attempts beyond these simplistic fixed timeouts [91]. Rather than covering all web activities, they focus on search-related activities only. Consequently, it is necessary to finding a meaningful way to cluster all activities including both searching and browsing on a web browser. The goal of this study is to find a way to better automatically segment users’ web activity into sessions. There are three research stages: 1) how people understand their mental model in the session segmentation, 2) how these self-identified sessions look in practically implemented weblogs, and 3) how we can algorithmically identify these sessions from browser activity, and how each algorithm performs. To answer these questions, firstly a qualitative study was conducted and a taxonomy of six factors related to the user-identified sessions was generated. Then a Chrome Extension was built that provided the practical reflection of user-identified sessions with comprehensive sets of web logs including both user interaction and visit details. This helped to gather a ground truth dataset to support further evaluation. Finally, several algorithmic approaches to automatically clustering web activities closer to user-identified sessions were evaluated. 2019-12-12 Thesis (University of Nottingham only) NonPeerReviewed application/pdf en arr https://eprints.nottingham.ac.uk/59119/1/thesis_submitted_20190920.pdf Ye, Chaoyu (2019) Automatically identifying coherent web sessions from browser logs. PhD thesis, University of Nottingham. web sessions browsers internet information retrieval
spellingShingle	web sessions browsers internet information retrieval Ye, Chaoyu Automatically identifying coherent web sessions from browser logs
title	Automatically identifying coherent web sessions from browser logs
title_full	Automatically identifying coherent web sessions from browser logs
title_fullStr	Automatically identifying coherent web sessions from browser logs
title_full_unstemmed	Automatically identifying coherent web sessions from browser logs
title_short	Automatically identifying coherent web sessions from browser logs
title_sort	automatically identifying coherent web sessions from browser logs
topic	web sessions browsers internet information retrieval
url	https://eprints.nottingham.ac.uk/59119/

Automatically identifying coherent web sessions from browser logs

Similar Items