Exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization

Every internet user today is exposed to countless article headlines. These can range from informative, to sensationalist, to downright misleading. These snippets of information can have tremendous impacts on those exposed and can shape ones views on a subject before even reading the associated artic...

Full description

Bibliographic Details
Main Author:	Albert, Corbin
Format:	Dissertation (University of Nottingham only)
Language:	English
Published:	2017
Online Access:	https://eprints.nottingham.ac.uk/48564/

_version_	1848797794875211776
author	Albert, Corbin
author_facet	Albert, Corbin
author_sort	Albert, Corbin
building	Nottingham Research Data Repository
collection	Online Access
description	Every internet user today is exposed to countless article headlines. These can range from informative, to sensationalist, to downright misleading. These snippets of information can have tremendous impacts on those exposed and can shape ones views on a subject before even reading the associated article. For these reasons and more, it is important that the Natural Language Processing community turn its attention towards this critical part of everyday life by improving current abstractive text summarization techniques. To aid in that endeavor, this project explores various methods of teacher forcing, a technique used during model training for sequence-to-sequence recurrent reural network architectures. A relatively new deep learning library called PyTorch has made experimentation with teacher forcing accessible for the first time and is utilized for this purpose in the project. Additionally, to the author’s best knowledge this is the first implementation of abstrac¬tive headline summarization in PyTorch. Seven different teacher forcing techniques were designed and experimented with: (1) Constant levels of 0%, 25%, 50%, 75%, and 100% teacher forcing probability through the entire training cycle; and (2) two different gradu¬ated techniques: one that decreased linearly from 100% to 0% through the entire training cycle to convergence, and another that graduated from 100% to 0% every 12.5% of the training cycle, often corresponding with learning rate annealing. Dozens of generative sequence-to-sequence models were trained with these various techniques to observe their differences. These seven different teacher forcing techniques were compared to one another via two metrics: (1) ROUGE F-scores, the most common metric used in this field; and (2) average loss over time. Counter to what was expected, this project shows with statistical significance that consistent 100% and 75% teacher forcing produced better ROUGE scores than any other metric. These results confirm the use of 100% teacher forcing, the most widely used technique today. However, this throws into question an important assumption by many leading machine learning researchers that dynamic, graduated teacher forcing techniques should results in greater model performance. Questions of ROUGE metric validity, response to more complicated model parameters, and domain specificity are encouraged for further analysis.
first_indexed	2025-11-14T20:09:33Z
format	Dissertation (University of Nottingham only)
id	nottingham-48564
institution	University of Nottingham Malaysia Campus
institution_category	Local University
language	English
last_indexed	2025-11-14T20:09:33Z
publishDate	2017
recordtype	eprints
repository_type	Digital Repository
spelling	nottingham-485642018-01-09T14:16:12Z https://eprints.nottingham.ac.uk/48564/ Exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization Albert, Corbin Every internet user today is exposed to countless article headlines. These can range from informative, to sensationalist, to downright misleading. These snippets of information can have tremendous impacts on those exposed and can shape ones views on a subject before even reading the associated article. For these reasons and more, it is important that the Natural Language Processing community turn its attention towards this critical part of everyday life by improving current abstractive text summarization techniques. To aid in that endeavor, this project explores various methods of teacher forcing, a technique used during model training for sequence-to-sequence recurrent reural network architectures. A relatively new deep learning library called PyTorch has made experimentation with teacher forcing accessible for the first time and is utilized for this purpose in the project. Additionally, to the author’s best knowledge this is the first implementation of abstrac¬tive headline summarization in PyTorch. Seven different teacher forcing techniques were designed and experimented with: (1) Constant levels of 0%, 25%, 50%, 75%, and 100% teacher forcing probability through the entire training cycle; and (2) two different gradu¬ated techniques: one that decreased linearly from 100% to 0% through the entire training cycle to convergence, and another that graduated from 100% to 0% every 12.5% of the training cycle, often corresponding with learning rate annealing. Dozens of generative sequence-to-sequence models were trained with these various techniques to observe their differences. These seven different teacher forcing techniques were compared to one another via two metrics: (1) ROUGE F-scores, the most common metric used in this field; and (2) average loss over time. Counter to what was expected, this project shows with statistical significance that consistent 100% and 75% teacher forcing produced better ROUGE scores than any other metric. These results confirm the use of 100% teacher forcing, the most widely used technique today. However, this throws into question an important assumption by many leading machine learning researchers that dynamic, graduated teacher forcing techniques should results in greater model performance. Questions of ROUGE metric validity, response to more complicated model parameters, and domain specificity are encouraged for further analysis. 2017-12-14 Dissertation (University of Nottingham only) NonPeerReviewed application/pdf en https://eprints.nottingham.ac.uk/48564/1/CorbinAlbert_MScDissertation.pdf Albert, Corbin (2017) Exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization. [Dissertation (University of Nottingham only)]
spellingShingle	Albert, Corbin Exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization
title	Exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization
title_full	Exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization
title_fullStr	Exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization
title_full_unstemmed	Exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization
title_short	Exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization
title_sort	exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization
url	https://eprints.nottingham.ac.uk/48564/

Exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization

Similar Items