Abstract:
The primary goal of every educational institution is to deliver the best educational experience
and knowledge to students. Achieving this goal involves recognizing students in need of extra
support and implementing measures to enhance their academic performance. This study
investigates five machine learning algorithms to construct a classification model that is capable
of predicting students’ academic performance.
The machine learning algorithms utilized in this study include Naïve Bayes, Decision Tree,
Logistic Regression, Random Forest, and Linear Regression. The model is constructed using
three distinct machine-learning platforms: WEKA, RapidMiner Studio, and Python. The
dataset for constructing the models are gathered directly from students via the questionnaire
data collection method. Initially, data was collected from 3,620 students. After preprocessing,
the dataset was reduced to 3,001 participants, comprising 916 females and 2,085 males. The
key stages of data preprocessing applied in this study include data cleaning, data reduction, and
data transformation. Subsequently, the dataset was divided, allocating 80% for training
purposes and 20% for testing.
The study adopts an experimental research methodology, constructing a model with chosen
machine-learning algorithms and tools. It is developed on a specific training dataset and
evaluated based on precision, recall, and accuracy metrics. The experimental results indicate
that the random forest algorithm, implemented using Python programming tools, achieved
promising outcomes with an accuracy of 95.00%, precision of 95.03%, and recall of 95.01%.
The findings of this study are promising and could potentially act as a springboard for
additional investigation within this area of research. The study identified a clear link between
academic ranking and various factors such as socio-demographic characteristics, economic
background, and educational practices. These factors encompass the student's place of origin
(be it urban, rural, or emerging regions), family background (including parents' education and
economic standing), previous academic performance, time allocated for studying, materials
used for examination preparation, and hours spent with peers. This research utilized exclusively
student data gathered from Haramaya University. Therefore, it is recommended that future
researchers strive to develop a generic model by collecting data from a diverse range of
Ethiopian universities.