Publications: Department of Computer Science & Engineering

Home/CSE/Publications
Journal Publications

1. "Convolutional Neural Networks(CNN) for Detecting Fruit Information using Machine Learning Techniques", In: IOSR Journal of Computer Engineering, Volume-22, Issue-02, (Mar-Apr 2020) PP 01-13. Authors: Fouzia Risdin, Pronab Kumar Mondal, Kazi Mahmudul Hassan

2. "Extracting Text Information from Digital Images", In: International Journal of Science & Engineering Research, Volume 10, Issue-06, year-2019.  Authors: Md. Mijanur Rahman, Mahnuma Rahman Rinty and Fouzia Risdin

   

Can a Simple Approach Perform Better for Cross-Project Defect Prediction?

We introduce a transfer learning technique, correlation alignment, in software defect prediction.

Automatic Regression Parameter Selection: A Divide and Conquer based Approach

Manually selection of optimal hyper parameter in regression (Lass, Ridge, Elastic net) is time consuming as well as error prone. In this work we introduce "divide and conquer" based approach here to select hyper parameter automatically and efficiently.

Threat and abusive language detection on social media in Bengali language

Abstract:
Threat and abusive languages spread quickly through social media which can be controlled if we can detect and remove them. Since there exist many social media like Facebook, Twitter, Instagram etc and a huge number of social media users, we need a robust and effective automatic system to identify threat and abusive languages. In our proposed system Machine Learning and Natural Language Processing techniques have been implemented to build an automatic system. Previous research on Bengali abusive language detection used Multinomial Näıve Bayes (MNB), Support Vector Machine(SVM) algorithms and considered Bengali Unicode characters to build their system. We considered both Unicode emoticons and Unicode Bengali characters as valid input in our proposed system. Besides MNB and SVM algorithm, we implemented Convolutional Neural Network (CNN) with Long Short Term Memory(LSTM). Among three algorithms, SVM with linear kernel performed best with 78% accuracy. 

Link: https://ieeexplore.ieee.org/document/8934609

The Challenges and Approaches during the Detection of Cyberbullying Text for Low-resource Language: A Literature Review

Abstract:

Article information: Objective: The primary intent of this paper is to review related studies that are more corresponding to the detection of five variants of cyberbullying text, such as abusive, hateful, aggressive, bully, and toxic comments or texts of Bengali language as a sample of low-resource language, to gain a comprehensive understanding of the challenges and state-of-the-art approaches used to identify these types of text. Materials: We have searched the associated articles on cyberbullying text detection in the Bengali language published from 2017 to July 2021 since there was no research being detected before the year 2017 on this domain-specific paradigm. After that, we scrutinize the different levels of aspects by inspecting the title, abstract, and entire text to enlist the subsequent research in this review study. Results: After applying different levels of filtering, from the initial search results, 28 domain-centric papers are considered out of 2,745 documents. At first, we deeply analyze the context of each study and then narrate a clear comparative review in case of research challenges and approaches, as well as providing the direction for the future work on the road to the detection of cyberbullying text for the Bengali language. Conclusion: In this paper, we discuss five variants of cyberbullying text, such as abusive text, hateful speech, aggressive text, bully text, and toxic comments over the web, and their detection process by studying existing literature in this domain. We present advice on dataset preparation, pre-process and feature extraction tasks, and classier selection that may aid in comprehensive research for better detection. 

Link: https://ph01.tci-thaijo.org/index.php/ecticit/article/view/248039

Opinion Mining: Is Feature Engineering Still Relevant?

Abstract:

This paper manifests the experimentation with sentiment polarity detection over Stanford's IMDB movie review dataset using a Support Vector Machine classifier (SVM). Our prime motivation was to find out the best possible combinations of classic features and preprocessing techniques for the classification of positive and negative opinions. We also explored two variants of kernels with numerous parameter settings for the classifier in the hope of getting the best SVM model. Our best model achieved an accuracy score of 85.45%. The results indicate that a model with a non-linear Radial Basis Function (RBF) kernel leads to the highest accuracy. The features that contributed the most are stemmed word n-grams.

Link: https://ieeexplore.ieee.org/document/9396874

Education Certification and Verified Documents Sharing System by Blockchain

The emergence of new and improved technological advances created severe problems in the security state of the educational certification system. Throughout this paper, a proposal has been made to improve security. Here, Blockchain technology has been introduced as reliable secure storage for the educational certification system, providing an additional facility to the users. That is the validation and authentication of the student’s academic records. Moreover, for security purposes, Blockchain technology can replace the traditional academic certification system and contribute to a new model for sharing student information. After completion of data inclusion and hashing, the blocks will be inserted into the Blockchain network. This proposed model enhances document security and fraud reduction and additionally reduces a significant amount of authentication time almost up to double the current speed. With this system, we will get a certification process in which all data will be digitalized and secured in an unbreakable database with proper authentication and with a noticeable amount of time efficiency.

An ML-based decision support system for reliable diagnosis of ovarian cancer by leveraging explainable AI

Ovarian cancer (OC) is one of the most prevalent types of cancer in women. Early and accurate diagnosis is crucial for the survival of the patients. However, the majority of women are diagnosed in advanced stages due to the lack of effective biomarkers and accurate screening tools. While previous studies sought a common biomarker, our study suggests different biomarkers for the premenopausal and postmenopausal populations. This can provide a new perspective in the search for novel predictors for the effective diagnosis of OC. Genetic algorithm has been utilized to identify the most significant biomarkers. The XGBoost classifier is then trained on the selected features and high ROC-AUC scores of 0.864 and 0.911 have been obtained for the premenopausal and postmenopausal populations, respectively. Lack of explainability is one major limitation of current AI systems. The stochastic nature of the ML algorithms raises concerns about the reliability of the system as it is difficult to interpret the reasons behind the decisions. To increase the trustworthiness and accountability of the diagnostic system as well as to provide transparency and explanations behind the predictions, explainable AI has been incorporated into the ML framework. SHAP is employed to quantify the contributions of the selected biomarkers and determine the most discriminative features. Merging SHAP with the ML models enables clinicians to investigate individual decisions made by the model and gain insights into the factors leading to that prediction. Thus, a hybrid decision support system has been established that can eliminate the bottlenecks caused by the black-box nature of the ML algorithms providing a safe and trustworthy AI tool. The diagnostic accuracy obtained from the proposed system outperforms the existing methods as well as the state-of-the-art ROMA algorithm by a substantial margin which signifies its potential to be an effective tool in the differential diagnosis of OC.

A CNN Based Model for Plant Disease Classification using Transfer Learning

Global food security is seriously threatened by plant diseases, which annually cause large losses in agricultural productivity. Early diagnosis and accurate classification of plant diseases are required for disease management programs to be implemented promptly and efficiently. In the area of plant disease classification, Convolutional Neural Networks (CNN) have demonstrated encouraging results in recent years. In this study, we propose a CNN based approach for plant disease classification using a MobileNetV2 based model and transfer learning. The proposed model leverages the MobileNetV2 architecture, known for its lightweight and efficient design, making it well-suited for resource-constrained environments. The pre-trained MobileNetV2 model is modified using transfer learning to accommodate the goal of classifying plant diseases. The model benefits from the characteristics that have been learned from a large-scale dataset through the use of pre-trained weights, leading to improved generalization and reduced training time. We use a standard plant disease dataset with a filtering method as a preprocessing strategy in extended trials to assess the efficiency of the proposed approach. The performance of the model is compared using several cutting-edge techniques, including VGG16, AlexNet and InceptionV3. The experimental findings show that the suggested model performs competitively in classifying plant diseases, surpassing other approaches with an accuracy of 98.56%.

A Transformer Based Model for Twitter Sentiment Analysis using RoBERTa

In recent years, social media platforms, particularly twitter, have emerged as crucial sources of public opinion and sentiment. Analyzing sentiment on twitter data presents a significant challenge due to the platform's inherent characteristics, such as brevity, informality, and the prevalence of slang and emojis. This research paper proposes a method for twitter sentiment analysis by leveraging the power of a transformer-based model called RoBERTa. The proposed strategy employs RoBERTa due to its exceptional performance in various natural language processing tasks. Our system captures intricate contextual information and semantic nuances in tweets, making it well-suited for sentiment analysis on this challenging platform. To build an effective sentiment analysis system, the architecture is fine-tuned using a large corpus of twitter data, annotated with sentiment labels. Additionally, we explore various strategies to handle the unique characteristics of twitter data, including tokenization, handling hashtags, user mentions, and URLs, as well as the incorporation of emojis and emoticons. We compare the performance of our model with three other standard machine learning and deep learning models, such as Decision Tree (DT), Support Vector Machine (SVM), and Long Short Term Memory (LSTM) in order to show that our model is superior at correctly analyzing twitter sentiment. The model showcases an exceptional accuracy of 96.78%, highlighting its effectiveness in understanding and classifying sentiment within the context of tweets.

Conference Papers

Nishat Tasnim, Asraf Ullah Rahat, Dr. Md. Musfique Anwar “Retrieving Top K% Relevant Patterns for Relation Extraction in Bangla using Distant Supervision”, International Conference on Signal Processing, Information, Communication and System 2024. [Accepted]

Journal Publication

Nishat Tasnim, Asraf Ullah Rahat, Dr. Md. Musfique Anwar “Bangla-REX: A Distinct Dataset for Bangla Relation Extraction”, Data in Brief, 2024. [Under Review]

Conference proceedings
  1. M. A. K. Rifat, A. Kabir, and A. Huq, “An Explainable Machine Learning Approach to Traffic Accident Fatality Prediction,” Procedia Computer Science, vol. 246, pp. 1905–1914, 2024, doi: https://doi.org/10.1016/j.procs.2024.09.704. ‌[Presented at the 28th International Conference on Knowledge Based and Intelligent Information and Engineering Systems (KES 2024), as part of a special issue.]