A/2, Jahurul Islam Avenue
Jahurul Islam City, Aftabnagar
Dhaka-1212, Bangladesh
Puja Chakraborty completed her Bachelor’s degree in Computer Science and Engineering from the University of Chittagong (in 2019). Subsequently, she was awarded with prestigious Erasmus Mundus Scholarship to pursue Master's degree in Language and Communication Technologies. She did the first year of Master's at the University of Trento (Italy) and completed the second year from the University of Groningen (The Netherlands). Her research interests encompass Natural Language Processing, Explainable AI, and Computational Linguistics. She is dedicated to advancing language technology and improving AI interpretability, contributing to the development of more transparent and effective AI systems. Through her work, she aspires to contribute to the progression of language technologies and improve the transparency of artificial intelligence systems.
Lecturer
Department of Computer Science and Engineering
Premier University, Chattogram, Bangladesh
Period: Feb, 2020 - Aug,2022
Natural Language Processing, Explainable AI, Computational Linguistics
-Multimodal Idiomaticity Representation
-Text Domain Identification
-Machine Translation
Abstract:
Threat and abusive languages spread quickly through social media which can be controlled if we can detect and remove them. Since there exist many social media like Facebook, Twitter, Instagram etc and a huge number of social media users, we need a robust and effective automatic system to identify threat and abusive languages. In our proposed system Machine Learning and Natural Language Processing techniques have been implemented to build an automatic system. Previous research on Bengali abusive language detection used Multinomial Näıve Bayes (MNB), Support Vector Machine(SVM) algorithms and considered Bengali Unicode characters to build their system. We considered both Unicode emoticons and Unicode Bengali characters as valid input in our proposed system. Besides MNB and SVM algorithm, we implemented Convolutional Neural Network (CNN) with Long Short Term Memory(LSTM). Among three algorithms, SVM with linear kernel performed best with 78% accuracy.
Abstract:
Article information: Objective: The primary intent of this paper is to review related studies that are more corresponding to the detection of five variants of cyberbullying text, such as abusive, hateful, aggressive, bully, and toxic comments or texts of Bengali language as a sample of low-resource language, to gain a comprehensive understanding of the challenges and state-of-the-art approaches used to identify these types of text. Materials: We have searched the associated articles on cyberbullying text detection in the Bengali language published from 2017 to July 2021 since there was no research being detected before the year 2017 on this domain-specific paradigm. After that, we scrutinize the different levels of aspects by inspecting the title, abstract, and entire text to enlist the subsequent research in this review study. Results: After applying different levels of filtering, from the initial search results, 28 domain-centric papers are considered out of 2,745 documents. At first, we deeply analyze the context of each study and then narrate a clear comparative review in case of research challenges and approaches, as well as providing the direction for the future work on the road to the detection of cyberbullying text for the Bengali language. Conclusion: In this paper, we discuss five variants of cyberbullying text, such as abusive text, hateful speech, aggressive text, bully text, and toxic comments over the web, and their detection process by studying existing literature in this domain. We present advice on dataset preparation, pre-process and feature extraction tasks, and classier selection that may aid in comprehensive research for better detection.
Link: https://ph01.tci-thaijo.org/index.php/ecticit/article/view/248039
Abstract:
This paper manifests the experimentation with sentiment polarity detection over Stanford's IMDB movie review dataset using a Support Vector Machine classifier (SVM). Our prime motivation was to find out the best possible combinations of classic features and preprocessing techniques for the classification of positive and negative opinions. We also explored two variants of kernels with numerous parameter settings for the classifier in the hope of getting the best SVM model. Our best model achieved an accuracy score of 85.45%. The results indicate that a model with a non-linear Radial Basis Function (RBF) kernel leads to the highest accuracy. The features that contributed the most are stemmed word n-grams.