Stylistic atructures: a computational approach to text classification

The problem of authorship attribution has received attention both in the academic world (e.g. did Shakespeare or Marlowe write Edward III?) and outside (e.g. is this confession really the words of the accused or was it made up by someone else?). Previous studies by statisticians and literary scholar...

Full description

Bibliographic Details
Main Author: Forsyth, Richard
Format: Thesis (University of Nottingham only)
Language:English
Published: 1996
Subjects:
Online Access:https://eprints.nottingham.ac.uk/13445/
_version_ 1848791735089496064
author Forsyth, Richard
author_facet Forsyth, Richard
author_sort Forsyth, Richard
building Nottingham Research Data Repository
collection Online Access
description The problem of authorship attribution has received attention both in the academic world (e.g. did Shakespeare or Marlowe write Edward III?) and outside (e.g. is this confession really the words of the accused or was it made up by someone else?). Previous studies by statisticians and literary scholars have sought "verbal habits" that characterize particular authors consistently. By and large, this has meant looking for distinctive rates of usage of specific marker words -- as in the classic study by Mosteller and Wallace of the Federalist Papers. The present study is based on the premiss that authorship attribution is just one type of text classification and that advances in this area can be made by applying and adapting techniques from the field of machine learning. Five different trainable text-classification systems are described, which differ from current stylometric practice in a number of ways, in particular by using a wider variety of marker patterns than customary and by seeking such markers automatically, without being told what to look for. A comparison of the strengths and weaknesses of these systems, when tested on a representative range of text-classification problems, confirms the importance of paying more attention than usual to alternative methods of representing distinctive differences between types of text. The thesis concludes with suggestions on how to make further progress towards the goal of a fully automatic, trainable text-classification system.
first_indexed 2025-11-14T18:33:14Z
format Thesis (University of Nottingham only)
id nottingham-13445
institution University of Nottingham Malaysia Campus
institution_category Local University
language English
last_indexed 2025-11-14T18:33:14Z
publishDate 1996
recordtype eprints
repository_type Digital Repository
spelling nottingham-134452025-02-28T11:25:14Z https://eprints.nottingham.ac.uk/13445/ Stylistic atructures: a computational approach to text classification Forsyth, Richard The problem of authorship attribution has received attention both in the academic world (e.g. did Shakespeare or Marlowe write Edward III?) and outside (e.g. is this confession really the words of the accused or was it made up by someone else?). Previous studies by statisticians and literary scholars have sought "verbal habits" that characterize particular authors consistently. By and large, this has meant looking for distinctive rates of usage of specific marker words -- as in the classic study by Mosteller and Wallace of the Federalist Papers. The present study is based on the premiss that authorship attribution is just one type of text classification and that advances in this area can be made by applying and adapting techniques from the field of machine learning. Five different trainable text-classification systems are described, which differ from current stylometric practice in a number of ways, in particular by using a wider variety of marker patterns than customary and by seeking such markers automatically, without being told what to look for. A comparison of the strengths and weaknesses of these systems, when tested on a representative range of text-classification problems, confirms the importance of paying more attention than usual to alternative methods of representing distinctive differences between types of text. The thesis concludes with suggestions on how to make further progress towards the goal of a fully automatic, trainable text-classification system. 1996-07 Thesis (University of Nottingham only) NonPeerReviewed application/pdf en arr https://eprints.nottingham.ac.uk/13445/1/RForsyth_PhD1995_SchoolofPsychology_UofNottm.pdf Forsyth, Richard (1996) Stylistic atructures: a computational approach to text classification. PhD thesis, University of Nottingham. computational stylometry authorship attribution text classification machine learning
spellingShingle computational stylometry
authorship attribution
text classification machine learning
Forsyth, Richard
Stylistic atructures: a computational approach to text classification
title Stylistic atructures: a computational approach to text classification
title_full Stylistic atructures: a computational approach to text classification
title_fullStr Stylistic atructures: a computational approach to text classification
title_full_unstemmed Stylistic atructures: a computational approach to text classification
title_short Stylistic atructures: a computational approach to text classification
title_sort stylistic atructures: a computational approach to text classification
topic computational stylometry
authorship attribution
text classification machine learning
url https://eprints.nottingham.ac.uk/13445/