Exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization

Every internet user today is exposed to countless article headlines. These can range from informative, to sensationalist, to downright misleading. These snippets of information can have tremendous impacts on those exposed and can shape ones views on a subject before even reading the associated artic...

Full description

Bibliographic Details
Main Author: Albert, Corbin
Format: Dissertation (University of Nottingham only)
Language:English
Published: 2017
Online Access:https://eprints.nottingham.ac.uk/48564/
_version_ 1848797794875211776
author Albert, Corbin
author_facet Albert, Corbin
author_sort Albert, Corbin
building Nottingham Research Data Repository
collection Online Access
description Every internet user today is exposed to countless article headlines. These can range from informative, to sensationalist, to downright misleading. These snippets of information can have tremendous impacts on those exposed and can shape ones views on a subject before even reading the associated article. For these reasons and more, it is important that the Natural Language Processing community turn its attention towards this critical part of everyday life by improving current abstractive text summarization techniques. To aid in that endeavor, this project explores various methods of teacher forcing, a technique used during model training for sequence-to-sequence recurrent reural network architectures. A relatively new deep learning library called PyTorch has made experimentation with teacher forcing accessible for the first time and is utilized for this purpose in the project. Additionally, to the author’s best knowledge this is the first implementation of abstrac¬tive headline summarization in PyTorch. Seven different teacher forcing techniques were designed and experimented with: (1) Constant levels of 0%, 25%, 50%, 75%, and 100% teacher forcing probability through the entire training cycle; and (2) two different gradu¬ated techniques: one that decreased linearly from 100% to 0% through the entire training cycle to convergence, and another that graduated from 100% to 0% every 12.5% of the training cycle, often corresponding with learning rate annealing. Dozens of generative sequence-to-sequence models were trained with these various techniques to observe their differences. These seven different teacher forcing techniques were compared to one another via two metrics: (1) ROUGE F-scores, the most common metric used in this field; and (2) average loss over time. Counter to what was expected, this project shows with statistical significance that consistent 100% and 75% teacher forcing produced better ROUGE scores than any other metric. These results confirm the use of 100% teacher forcing, the most widely used technique today. However, this throws into question an important assumption by many leading machine learning researchers that dynamic, graduated teacher forcing techniques should results in greater model performance. Questions of ROUGE metric validity, response to more complicated model parameters, and domain specificity are encouraged for further analysis.
first_indexed 2025-11-14T20:09:33Z
format Dissertation (University of Nottingham only)
id nottingham-48564
institution University of Nottingham Malaysia Campus
institution_category Local University
language English
last_indexed 2025-11-14T20:09:33Z
publishDate 2017
recordtype eprints
repository_type Digital Repository
spelling nottingham-485642018-01-09T14:16:12Z https://eprints.nottingham.ac.uk/48564/ Exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization Albert, Corbin Every internet user today is exposed to countless article headlines. These can range from informative, to sensationalist, to downright misleading. These snippets of information can have tremendous impacts on those exposed and can shape ones views on a subject before even reading the associated article. For these reasons and more, it is important that the Natural Language Processing community turn its attention towards this critical part of everyday life by improving current abstractive text summarization techniques. To aid in that endeavor, this project explores various methods of teacher forcing, a technique used during model training for sequence-to-sequence recurrent reural network architectures. A relatively new deep learning library called PyTorch has made experimentation with teacher forcing accessible for the first time and is utilized for this purpose in the project. Additionally, to the author’s best knowledge this is the first implementation of abstrac¬tive headline summarization in PyTorch. Seven different teacher forcing techniques were designed and experimented with: (1) Constant levels of 0%, 25%, 50%, 75%, and 100% teacher forcing probability through the entire training cycle; and (2) two different gradu¬ated techniques: one that decreased linearly from 100% to 0% through the entire training cycle to convergence, and another that graduated from 100% to 0% every 12.5% of the training cycle, often corresponding with learning rate annealing. Dozens of generative sequence-to-sequence models were trained with these various techniques to observe their differences. These seven different teacher forcing techniques were compared to one another via two metrics: (1) ROUGE F-scores, the most common metric used in this field; and (2) average loss over time. Counter to what was expected, this project shows with statistical significance that consistent 100% and 75% teacher forcing produced better ROUGE scores than any other metric. These results confirm the use of 100% teacher forcing, the most widely used technique today. However, this throws into question an important assumption by many leading machine learning researchers that dynamic, graduated teacher forcing techniques should results in greater model performance. Questions of ROUGE metric validity, response to more complicated model parameters, and domain specificity are encouraged for further analysis. 2017-12-14 Dissertation (University of Nottingham only) NonPeerReviewed application/pdf en https://eprints.nottingham.ac.uk/48564/1/CorbinAlbert_MScDissertation.pdf Albert, Corbin (2017) Exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization. [Dissertation (University of Nottingham only)]
spellingShingle Albert, Corbin
Exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization
title Exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization
title_full Exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization
title_fullStr Exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization
title_full_unstemmed Exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization
title_short Exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization
title_sort exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization
url https://eprints.nottingham.ac.uk/48564/