Mathematical Aspects of Word Embeddings

Word embeddings are a popular way of modelling relationships between words. Words are represented as low-dimensional vectors, such that the distances between the vectors reflect relationships between the words: words which are more similar to each other should be closer together in the embedding sp...

Full description

Bibliographic Details
Main Author: Carrington, Rachel
Format: Thesis (University of Nottingham only)
Language:English
Published: 2021
Subjects:
Online Access:https://eprints.nottingham.ac.uk/65089/
_version_ 1848800184745590784
author Carrington, Rachel
author_facet Carrington, Rachel
author_sort Carrington, Rachel
building Nottingham Research Data Repository
collection Online Access
description Word embeddings are a popular way of modelling relationships between words. Words are represented as low-dimensional vectors, such that the distances between the vectors reflect relationships between the words: words which are more similar to each other should be closer together in the embedding space. This thesis explores several different aspects of word embeddings. First, we look at the problem of non-identifiability: word embeddings are generated by optimizing an objective function, but the optimal embedding set is not unique. This has consequences for how embeddings are evaluated, and for making comparisons between different word embedding methods. We explain why this is the case and propose some solutions for dealing with it. We then explore the potential for generating semi-supervised word embeddings, with the aim being to more accurately capture the relationships between words, compared to using standard unsupervised embedding methods. We introduce three semi-supervised objective functions, derive algorithms for optimizing them, and implement them on simulated and real data. Finally, we look at the generation of time-dependent word embeddings, in particular the development of statistical tests for assessing whether certain words have changed in meaning or usage over a given time period. We introduce a time-dependent word embedding model and use it to test for change over time. However, we find that we are unable to distinguish between the presence of time dependence and a misspecified embedding dimension.
first_indexed 2025-11-14T20:47:32Z
format Thesis (University of Nottingham only)
id nottingham-65089
institution University of Nottingham Malaysia Campus
institution_category Local University
language English
last_indexed 2025-11-14T20:47:32Z
publishDate 2021
recordtype eprints
repository_type Digital Repository
spelling nottingham-650892024-01-18T15:17:40Z https://eprints.nottingham.ac.uk/65089/ Mathematical Aspects of Word Embeddings Carrington, Rachel Word embeddings are a popular way of modelling relationships between words. Words are represented as low-dimensional vectors, such that the distances between the vectors reflect relationships between the words: words which are more similar to each other should be closer together in the embedding space. This thesis explores several different aspects of word embeddings. First, we look at the problem of non-identifiability: word embeddings are generated by optimizing an objective function, but the optimal embedding set is not unique. This has consequences for how embeddings are evaluated, and for making comparisons between different word embedding methods. We explain why this is the case and propose some solutions for dealing with it. We then explore the potential for generating semi-supervised word embeddings, with the aim being to more accurately capture the relationships between words, compared to using standard unsupervised embedding methods. We introduce three semi-supervised objective functions, derive algorithms for optimizing them, and implement them on simulated and real data. Finally, we look at the generation of time-dependent word embeddings, in particular the development of statistical tests for assessing whether certain words have changed in meaning or usage over a given time period. We introduce a time-dependent word embedding model and use it to test for change over time. However, we find that we are unable to distinguish between the presence of time dependence and a misspecified embedding dimension. 2021-08-04 Thesis (University of Nottingham only) NonPeerReviewed application/pdf en cc_by https://eprints.nottingham.ac.uk/65089/1/Rachel_Carrington_thesis.pdf Carrington, Rachel (2021) Mathematical Aspects of Word Embeddings. PhD thesis, University of Nottingham. word embedding language analysis natural language processing data
spellingShingle word embedding
language analysis
natural language processing
data
Carrington, Rachel
Mathematical Aspects of Word Embeddings
title Mathematical Aspects of Word Embeddings
title_full Mathematical Aspects of Word Embeddings
title_fullStr Mathematical Aspects of Word Embeddings
title_full_unstemmed Mathematical Aspects of Word Embeddings
title_short Mathematical Aspects of Word Embeddings
title_sort mathematical aspects of word embeddings
topic word embedding
language analysis
natural language processing
data
url https://eprints.nottingham.ac.uk/65089/