Profanity and hate speech detection

Profanity, often found in today’s online social media, has been used to detect online hate speech. The aims of this study were to investigate the profanity usage on Twitter by different groups of users, and to quantify the effectiveness of using profanity in detecting hate speech. Tweets from three...

Full description

Bibliographic Details
Main Authors: Teh, Phoey Lee *, Cheng, Chi-Bin
Format: Article
Language:English
Published: Tamkang University 2020
Subjects:
Online Access:http://eprints.sunway.edu.my/1534/
http://eprints.sunway.edu.my/1534/1/Teh%20Phoey%20Lee%20Preprint%20-%20Profanity%20and%20Hate%20Speech%20Detection.pdfx
_version_ 1848802080243843072
author Teh, Phoey Lee *
Cheng, Chi-Bin
author_facet Teh, Phoey Lee *
Cheng, Chi-Bin
author_sort Teh, Phoey Lee *
building SU Institutional Repository
collection Online Access
description Profanity, often found in today’s online social media, has been used to detect online hate speech. The aims of this study were to investigate the profanity usage on Twitter by different groups of users, and to quantify the effectiveness of using profanity in detecting hate speech. Tweets from three English-speaking countries, Australia, Malaysia, and the United States, were collected for data analysis. Statistical hypothesis tests were performed to justify the difference of profanity usage among the three countries, and a probability estimation procedure was formulated based on Bayes theorem to quantify the effectiveness of profanity-based methods in hate speech detection. Three deep learning methods, long short-term memory (LSTM), bidirectional LSTM (BLSTM), and bidirectional encoder representations from transformers (BERT) are further used to evaluate the effect of profanity screening on building classification model. Our experimental results show that the effectiveness of using profanity in detecting hate speech is questionable. Nevertheless, the results also show that for Australia tweets, where profanity is more associated with hatred, profanity-based methods in hate speech detection could be effective and profanity screening can address the class imbalance issue in hate speech detection. This is evidenced by the performances of using deep learning methods on the profanity screened data of Australia data, which achieved a classification f1-score greater than 0.84.
first_indexed 2025-11-14T21:17:39Z
format Article
id sunway-1534
institution Sunway University
institution_category Local University
language English
last_indexed 2025-11-14T21:17:39Z
publishDate 2020
publisher Tamkang University
recordtype eprints
repository_type Digital Repository
spelling sunway-15342021-07-30T08:18:19Z http://eprints.sunway.edu.my/1534/ Profanity and hate speech detection Teh, Phoey Lee * Cheng, Chi-Bin HM Sociology QA75 Electronic computers. Computer science Profanity, often found in today’s online social media, has been used to detect online hate speech. The aims of this study were to investigate the profanity usage on Twitter by different groups of users, and to quantify the effectiveness of using profanity in detecting hate speech. Tweets from three English-speaking countries, Australia, Malaysia, and the United States, were collected for data analysis. Statistical hypothesis tests were performed to justify the difference of profanity usage among the three countries, and a probability estimation procedure was formulated based on Bayes theorem to quantify the effectiveness of profanity-based methods in hate speech detection. Three deep learning methods, long short-term memory (LSTM), bidirectional LSTM (BLSTM), and bidirectional encoder representations from transformers (BERT) are further used to evaluate the effect of profanity screening on building classification model. Our experimental results show that the effectiveness of using profanity in detecting hate speech is questionable. Nevertheless, the results also show that for Australia tweets, where profanity is more associated with hatred, profanity-based methods in hate speech detection could be effective and profanity screening can address the class imbalance issue in hate speech detection. This is evidenced by the performances of using deep learning methods on the profanity screened data of Australia data, which achieved a classification f1-score greater than 0.84. Tamkang University 2020 Article PeerReviewed text en cc_by_nc_4 http://eprints.sunway.edu.my/1534/1/Teh%20Phoey%20Lee%20Preprint%20-%20Profanity%20and%20Hate%20Speech%20Detection.pdfx Teh, Phoey Lee * and Cheng, Chi-Bin (2020) Profanity and hate speech detection. International Journal of Information and Management Sciences, 31 (3). pp. 227-246. ISSN 1017-1819 https://www.airitilibrary.com/Publication/alPublicationJournal?PublicationID=10171819&IssueID=202010270001
spellingShingle HM Sociology
QA75 Electronic computers. Computer science
Teh, Phoey Lee *
Cheng, Chi-Bin
Profanity and hate speech detection
title Profanity and hate speech detection
title_full Profanity and hate speech detection
title_fullStr Profanity and hate speech detection
title_full_unstemmed Profanity and hate speech detection
title_short Profanity and hate speech detection
title_sort profanity and hate speech detection
topic HM Sociology
QA75 Electronic computers. Computer science
url http://eprints.sunway.edu.my/1534/
http://eprints.sunway.edu.my/1534/
http://eprints.sunway.edu.my/1534/1/Teh%20Phoey%20Lee%20Preprint%20-%20Profanity%20and%20Hate%20Speech%20Detection.pdfx