Evaluating Effectiveness of Feature Extraction Technique in Gujarati Hate Speech Detection

Main Article Content

Abhilasha Vadesara
Purna Tanna

Abstract

Natural language processing has been built so much im-portance in recent years. NLP and machine learning can recognize the hidden feature from a tremendous volume of text data for text classification and sentiment analysis. Twitter has become one of the most popular microblogging services for sharing and receiving ideas and views world-wide. However, users sometimes post the incidence of ag-gression and related incidents like trolling, cyberbullying, flaming, spreading hate etc. For that reason, the detection of hate speech is required for many social media services.
In this paper, we experimented with different feature ex-traction approaches like BoW (Bag of Word) and TF-IDF to extract the feature from Gujarati hate speech. The exper-iment was done on 12K tweets. We implemented the pre-processing technique, such as removing unnecessary sym-bols, URLs, characters, and stop words to improve the clas-sification accuracy in the machine learning model. Fleiss's Kappa technique is used to check inter agreement between 25 annotators, who annotate the whole corpus and have achieved 0.87% accuracy.

Downloads

Download data is not yet available.

Article Details

How to Cite
1.
Vadesara A, Tanna P. Evaluating Effectiveness of Feature Extraction Technique in Gujarati Hate Speech Detection. sms [Internet]. 29Dec.2022 [cited 25Apr.2025];14(04):147-52. Available from: https://smsjournals.com/index.php/SAMRIDDHI/article/view/3112
Section
Research Article