The power of normalised word vectors for automatically grading essays

Latent Semantic Analysis, when used for automated essay grading, makes use of document word count vectors for scoring the essays against domain knowledge. Words in the domain knowledge documents and essays are counted, and Singular Value Decomposition is undertaken to reduce the dimensions of the se...

Full description

Bibliographic Details
Main Author: Williams, Robert
Format: Journal Article
Published: The Informing Science Institute 2006
Subjects:
Online Access:http://proceedings.informingscience.org/InSITE2006/IISITWill155.pdf
http://hdl.handle.net/20.500.11937/46415
Description
Summary:Latent Semantic Analysis, when used for automated essay grading, makes use of document word count vectors for scoring the essays against domain knowledge. Words in the domain knowledge documents and essays are counted, and Singular Value Decomposition is undertaken to reduce the dimensions of the semantic space. Near neighbour vector cosines and other variables are used to calculate an essay score. This paper discusses a technique for computing word count vectors where the words are first normalised using thesaurus concept index numbers. This approach leads to a vector space of 812 dimensions, does not require Singular Value Decomposition, and leads to a reduced computational load. The cosine between the vectors for the student essay and a model answer proves to be a very powerful independent variable when used in regression analysis to score essays. An example of its use in practice is discussed.