Ablation study on feature group importance for automated essay scoring

Grading of written academic essays by humans requires significant effort. It is a time-consuming task and is vulnerable to human biases. Ever since the introduction of modern computing, this has been one of the many automations being explored. Researches in automated essay scoring have been on-going...

Full description

Bibliographic Details
Main Authors: Tan, Jih Soong, Tan, Ian K.T.
Format: Article
Language:English
Published: Penerbit Universiti Kebangsaan Malaysia 2022
Online Access:http://journalarticle.ukm.my/19430/
http://journalarticle.ukm.my/19430/1/08.pdf
Description
Summary:Grading of written academic essays by humans requires significant effort. It is a time-consuming task and is vulnerable to human biases. Ever since the introduction of modern computing, this has been one of the many automations being explored. Researches in automated essay scoring have been on-going, where the majority of the researches in recent years are based on extracting multiple linguistic features and using them to build a classification model for automated essay scoring. The 3 main types of features used are lexical, grammatical, and semantic. In our work, we conducted an ablation study to discover the engineered features that has the weakest influence. We did this using a generic feature engineering and classification approach that was used by the winners of the Automated Student Assessment Prize (ASAP). This is to mitigate biases that may have addressed specific feature engineering or models. Our results show that a semantic feature called the prompt has been the weakest feature in influencing the models. From further investigations, this was due to it being over-fitted in the classification model.