Can a Simple Approach Perform Better for Cross-Project Defect Prediction?
We introduce a transfer learning technique, correlation alignment, in software defect prediction.
Automatic Regression Parameter Selection: A Divide and Conquer based Approach
Manually selection of optimal hyper parameter in regression (Lass, Ridge, Elastic net) is time consuming as well as error prone. In this work we introduce "divide and conquer" based approach here to select hyper parameter automatically and efficiently.
Threat and abusive language detection on social media in Bengali language
Abstract:
Threat and abusive languages spread quickly through social media which can be controlled if we can detect and remove them. Since there exist many social media like Facebook, Twitter, Instagram etc and a huge number of social media users, we need a robust and effective automatic system to identify threat and abusive languages. In our proposed system Machine Learning and Natural Language Processing techniques have been implemented to build an automatic system. Previous research on Bengali abusive language detection used Multinomial Näıve Bayes (MNB), Support Vector Machine(SVM) algorithms and considered Bengali Unicode characters to build their system. We considered both Unicode emoticons and Unicode Bengali characters as valid input in our proposed system. Besides MNB and SVM algorithm, we implemented Convolutional Neural Network (CNN) with Long Short Term Memory(LSTM). Among three algorithms, SVM with linear kernel performed best with 78% accuracy.
The Challenges and Approaches during the Detection of Cyberbullying Text for Low-resource Language: A Literature Review
Abstract:
Article information: Objective: The primary intent of this paper is to review related studies that are more corresponding to the detection of five variants of cyberbullying text, such as abusive, hateful, aggressive, bully, and toxic comments or texts of Bengali language as a sample of low-resource language, to gain a comprehensive understanding of the challenges and state-of-the-art approaches used to identify these types of text. Materials: We have searched the associated articles on cyberbullying text detection in the Bengali language published from 2017 to July 2021 since there was no research being detected before the year 2017 on this domain-specific paradigm. After that, we scrutinize the different levels of aspects by inspecting the title, abstract, and entire text to enlist the subsequent research in this review study. Results: After applying different levels of filtering, from the initial search results, 28 domain-centric papers are considered out of 2,745 documents. At first, we deeply analyze the context of each study and then narrate a clear comparative review in case of research challenges and approaches, as well as providing the direction for the future work on the road to the detection of cyberbullying text for the Bengali language. Conclusion: In this paper, we discuss five variants of cyberbullying text, such as abusive text, hateful speech, aggressive text, bully text, and toxic comments over the web, and their detection process by studying existing literature in this domain. We present advice on dataset preparation, pre-process and feature extraction tasks, and classier selection that may aid in comprehensive research for better detection.
Link: https://ph01.tci-thaijo.org/index.php/ecticit/article/view/248039
Opinion Mining: Is Feature Engineering Still Relevant?
Abstract:
This paper manifests the experimentation with sentiment polarity detection over Stanford's IMDB movie review dataset using a Support Vector Machine classifier (SVM). Our prime motivation was to find out the best possible combinations of classic features and preprocessing techniques for the classification of positive and negative opinions. We also explored two variants of kernels with numerous parameter settings for the classifier in the hope of getting the best SVM model. Our best model achieved an accuracy score of 85.45%. The results indicate that a model with a non-linear Radial Basis Function (RBF) kernel leads to the highest accuracy. The features that contributed the most are stemmed word n-grams.
Education Certification and Verified Documents Sharing System by Blockchain
The emergence of new and improved technological advances created severe problems in the security state of the educational certification system. Throughout this paper, a proposal has been made to improve security. Here, Blockchain technology has been introduced as reliable secure storage for the educational certification system, providing an additional facility to the users. That is the validation and authentication of the student’s academic records. Moreover, for security purposes, Blockchain technology can replace the traditional academic certification system and contribute to a new model for sharing student information. After completion of data inclusion and hashing, the blocks will be inserted into the Blockchain network. This proposed model enhances document security and fraud reduction and additionally reduces a significant amount of authentication time almost up to double the current speed. With this system, we will get a certification process in which all data will be digitalized and secured in an unbreakable database with proper authentication and with a noticeable amount of time efficiency.
An ML-based decision support system for reliable diagnosis of ovarian cancer by leveraging explainable AI
Ovarian cancer (OC) is one of the most prevalent types of cancer in women. Early and accurate diagnosis is crucial for the survival of the patients. However, the majority of women are diagnosed in advanced stages due to the lack of effective biomarkers and accurate screening tools. While previous studies sought a common biomarker, our study suggests different biomarkers for the premenopausal and postmenopausal populations. This can provide a new perspective in the search for novel predictors for the effective diagnosis of OC. Genetic algorithm has been utilized to identify the most significant biomarkers. The XGBoost classifier is then trained on the selected features and high ROC-AUC scores of 0.864 and 0.911 have been obtained for the premenopausal and postmenopausal populations, respectively. Lack of explainability is one major limitation of current AI systems. The stochastic nature of the ML algorithms raises concerns about the reliability of the system as it is difficult to interpret the reasons behind the decisions. To increase the trustworthiness and accountability of the diagnostic system as well as to provide transparency and explanations behind the predictions, explainable AI has been incorporated into the ML framework. SHAP is employed to quantify the contributions of the selected biomarkers and determine the most discriminative features. Merging SHAP with the ML models enables clinicians to investigate individual decisions made by the model and gain insights into the factors leading to that prediction. Thus, a hybrid decision support system has been established that can eliminate the bottlenecks caused by the black-box nature of the ML algorithms providing a safe and trustworthy AI tool. The diagnostic accuracy obtained from the proposed system outperforms the existing methods as well as the state-of-the-art ROMA algorithm by a substantial margin which signifies its potential to be an effective tool in the differential diagnosis of OC.
A CNN Based Model for Plant Disease Classification using Transfer Learning
Global food security is seriously threatened by plant diseases, which annually cause large losses in agricultural productivity. Early diagnosis and accurate classification of plant diseases are required for disease management programs to be implemented promptly and efficiently. In the area of plant disease classification, Convolutional Neural Networks (CNN) have demonstrated encouraging results in recent years. In this study, we propose a CNN based approach for plant disease classification using a MobileNetV2 based model and transfer learning. The proposed model leverages the MobileNetV2 architecture, known for its lightweight and efficient design, making it well-suited for resource-constrained environments. The pre-trained MobileNetV2 model is modified using transfer learning to accommodate the goal of classifying plant diseases. The model benefits from the characteristics that have been learned from a large-scale dataset through the use of pre-trained weights, leading to improved generalization and reduced training time. We use a standard plant disease dataset with a filtering method as a preprocessing strategy in extended trials to assess the efficiency of the proposed approach. The performance of the model is compared using several cutting-edge techniques, including VGG16, AlexNet and InceptionV3. The experimental findings show that the suggested model performs competitively in classifying plant diseases, surpassing other approaches with an accuracy of 98.56%.
A Transformer Based Model for Twitter Sentiment Analysis using RoBERTa
In recent years, social media platforms, particularly twitter, have emerged as crucial sources of public opinion and sentiment. Analyzing sentiment on twitter data presents a significant challenge due to the platform's inherent characteristics, such as brevity, informality, and the prevalence of slang and emojis. This research paper proposes a method for twitter sentiment analysis by leveraging the power of a transformer-based model called RoBERTa. The proposed strategy employs RoBERTa due to its exceptional performance in various natural language processing tasks. Our system captures intricate contextual information and semantic nuances in tweets, making it well-suited for sentiment analysis on this challenging platform. To build an effective sentiment analysis system, the architecture is fine-tuned using a large corpus of twitter data, annotated with sentiment labels. Additionally, we explore various strategies to handle the unique characteristics of twitter data, including tokenization, handling hashtags, user mentions, and URLs, as well as the incorporation of emojis and emoticons. We compare the performance of our model with three other standard machine learning and deep learning models, such as Decision Tree (DT), Support Vector Machine (SVM), and Long Short Term Memory (LSTM) in order to show that our model is superior at correctly analyzing twitter sentiment. The model showcases an exceptional accuracy of 96.78%, highlighting its effectiveness in understanding and classifying sentiment within the context of tweets.
Enhancing E-Commerce Text Classification: A GRU-Based Approach for Improved Product Understanding
In the burgeoning landscape of e-commerce, the ability to accurately classify product texts is paramount for enhancing user experience and driving business success. Traditional approaches to text classification often struggle with the nuances and complexities inherent in e-commerce product descriptions. In this paper, we propose a novel approach utilizing Gated Recurrent Unit (GRU) to address these challenges and improve product understanding in e-commerce text classification tasks. Our model leverages the inherent sequential nature of product descriptions, effectively capturing long-range dependencies and semantic relationships within the text. We use a standard dataset in extended trials to demonstrate the superiority of our GRU-based approach over conventional methods in terms of classification accuracy and robustness across diverse product categories. Furthermore, we conduct comprehensive analyses to gain insights into the inner workings of our model and its ability to learn meaningful representations of e-commerce text data. The performance of the model is compared using several cutting-edge techniques, including Support Vector Machine (SVM), Random Forest (RF), and Long Short-Term Memory (LSTM) in order to show that our model is superior at correctly classifying e-commerce texts. The experimental findings show that the suggested model performs competitively in classifying e-commerce texts, surpassing other approaches with an accuracy of 98.35%. Our findings underscore the potential of GRU-based approaches for advancing the state-of-the-art in e-commerce text classification, offering promising avenues for future research and practical applications in the domain.
Conference Papers
Nishat Tasnim, Asraf Ullah Rahat, Dr. Md. Musfique Anwar “Retrieving Top K% Relevant Patterns for Relation Extraction in Bangla using Distant Supervision”, International Conference on Signal Processing, Information, Communication and System 2024. [Accepted]
Journal Publication
Nishat Tasnim, Asraf Ullah Rahat, Tanjim Taharat Aurpa, Dr. Md. Musfique Anwar “Bangla-REX: A Distinct Dataset for Bangla Relation Extraction”, Data in Brief, 2025. [Accepted]
Conference proceedings
- M. A. K. Rifat, A. Kabir, and A. Huq, “An Explainable Machine Learning Approach to Traffic Accident Fatality Prediction,” Procedia Computer Science, vol. 246, pp. 1905–1914, 2024, doi: https://doi.org/10.1016/j.procs.2024.09.704. [Presented at the 28th International Conference on Knowledge Based and Intelligent Information and Engineering Systems (KES 2024), as part of a special issue.]
Prevalence and User Perception of Dark Patterns: A Case Study on E-Commerce Websites of Bangladesh
Y. Sazid and K. Sakib
19th International Conference on Evaluation of Novel Approaches to Software Engineering | ENASE 2024
Commit Classification into Maintenance Activities Using In-Context Learning Capabilities of LLMs
Y. Sazid, S. Kuri, K. S. Ahmed, and A. Satter
19th International Conference on Evaluation of Novel Approaches to Software Engineering | ENASE 2024
Automated Detection of Dark Patterns Using In-Context Learning Capabilities of GPT-3
Y. Sazid, M. M. N. Fuad, and K. Sakib
30th Asia-Pacific Software Engineering Conference | APSEC 2023