Data annotation architecture for automatic depression detection

Depression is a mood disorder that causes a person to feel sad, tired and experience a prolonged lack of energy, irritability, and loss of interest in daily activities. Many scholars have contributed in identifying and curbing depression. One of such efforts is the development of a model that can id...

Full description

Bibliographic Details
Main Authors: Chang, Yun Yao, Nazlia Omar
Format: Article
Language:English
Published: Penerbit Universiti Kebangsaan Malaysia 2023
Online Access:http://journalarticle.ukm.my/22537/
http://journalarticle.ukm.my/22537/1/03%20-.pdf
_version_ 1848815623892631552
author Chang, Yun Yao
Nazlia Omar,
author_facet Chang, Yun Yao
Nazlia Omar,
author_sort Chang, Yun Yao
building UKM Institutional Repository
collection Online Access
description Depression is a mood disorder that causes a person to feel sad, tired and experience a prolonged lack of energy, irritability, and loss of interest in daily activities. Many scholars have contributed in identifying and curbing depression. One of such efforts is the development of a model that can identify and predict depression among Twitter users. However, so far, there is no quality and labeled dataset containing depression from tweet sources. Therefore, the purpose of this study is to propose an architecture that can collect data on social media such as Twitter to detect depression automatically. This study involves text analysis that begins with data scraping, text processing, feature extraction, modeling, evaluation and followed by document corpus analysis using TF-IDF and BOW. The sentiment lexicon derived from two tools, TextBlob and Vader, was used to distinguish the emotions of words. Four machine learning classifiers i.e., Logistic Regression, Decision Tree, Support Vector Machine and K-Nearest Neighbour were used to perform the classification. The final data set management and the use of Logistic Regression produced the expected high accuracy, precision, recall and F1-Score results in predicting depression. For the application, data for Malaysia local COVID-19 tweets was scraped using TWINT. Appropriate hashtags and keywords were used to obtain tweet sentences. The results show that the proposed architecture outperforms the baseline by achieving 92.876% F1-Score through SVM+TFIDF compared to the F-Score obtained by the baseline. This shows that the proposed data annotation architecture has good performance in detecting depression.
first_indexed 2025-11-15T00:52:56Z
format Article
id oai:generic.eprints.org:22537
institution Universiti Kebangasaan Malaysia
institution_category Local University
language English
last_indexed 2025-11-15T00:52:56Z
publishDate 2023
publisher Penerbit Universiti Kebangsaan Malaysia
recordtype eprints
repository_type Digital Repository
spelling oai:generic.eprints.org:225372023-11-23T03:20:07Z http://journalarticle.ukm.my/22537/ Data annotation architecture for automatic depression detection Chang, Yun Yao Nazlia Omar, Depression is a mood disorder that causes a person to feel sad, tired and experience a prolonged lack of energy, irritability, and loss of interest in daily activities. Many scholars have contributed in identifying and curbing depression. One of such efforts is the development of a model that can identify and predict depression among Twitter users. However, so far, there is no quality and labeled dataset containing depression from tweet sources. Therefore, the purpose of this study is to propose an architecture that can collect data on social media such as Twitter to detect depression automatically. This study involves text analysis that begins with data scraping, text processing, feature extraction, modeling, evaluation and followed by document corpus analysis using TF-IDF and BOW. The sentiment lexicon derived from two tools, TextBlob and Vader, was used to distinguish the emotions of words. Four machine learning classifiers i.e., Logistic Regression, Decision Tree, Support Vector Machine and K-Nearest Neighbour were used to perform the classification. The final data set management and the use of Logistic Regression produced the expected high accuracy, precision, recall and F1-Score results in predicting depression. For the application, data for Malaysia local COVID-19 tweets was scraped using TWINT. Appropriate hashtags and keywords were used to obtain tweet sentences. The results show that the proposed architecture outperforms the baseline by achieving 92.876% F1-Score through SVM+TFIDF compared to the F-Score obtained by the baseline. This shows that the proposed data annotation architecture has good performance in detecting depression. Penerbit Universiti Kebangsaan Malaysia 2023-06 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/22537/1/03%20-.pdf Chang, Yun Yao and Nazlia Omar, (2023) Data annotation architecture for automatic depression detection. Asia-Pacific Journal of Information Technology and Multimedia, 12 (1). pp. 39-56. ISSN 2289-2192 https://www.ukm.my/apjitm/
spellingShingle Chang, Yun Yao
Nazlia Omar,
Data annotation architecture for automatic depression detection
title Data annotation architecture for automatic depression detection
title_full Data annotation architecture for automatic depression detection
title_fullStr Data annotation architecture for automatic depression detection
title_full_unstemmed Data annotation architecture for automatic depression detection
title_short Data annotation architecture for automatic depression detection
title_sort data annotation architecture for automatic depression detection
url http://journalarticle.ukm.my/22537/
http://journalarticle.ukm.my/22537/
http://journalarticle.ukm.my/22537/1/03%20-.pdf