PHRASE BASED AUTOMATIC AFAAN OROMO NEWS TEXT CLASSIFICATION: A MACHINE LEARNING APPROACH

Show simple item record

dc.contributor.author Bekele Lemm, Meseret
dc.contributor.author Mulugeta, (PhD) Wondwossen
dc.contributor.author G/Michael, MSc) Kibru
dc.date.accessioned 2022-02-16T06:44:52Z
dc.date.available 2022-02-16T06:44:52Z
dc.date.issued 2021-06
dc.identifier.uri http://ir.haramaya.edu.et//hru/handle/123456789/4762
dc.description 116p. en_US
dc.description.abstract Text classification is the assignment of text documents to one or more predefined classes based on their content and topic. The classification is usually done on the basis of significant phrases or features extracted from the text document using N-gram phrase extraction method. In this study design science approach is adopted to explore the possibility of using varying N-gram model for Afaan Oromo news text classification. Supervised machine learning methods; Naïve Bayes(NB), Support Vector Machine(SVM) and Decision Tree(DTree) were used to explore the efficiency of the classification task. For experimentation, 1007 news documents are collected from Afaan Oromo news portals and annotated. From the literature, nine predefined classes; adaa fi turiziimii/culture and tourizm, barnoota/education, dinagde/business, fayyaa/health, ispoortii/sport, qonna/agriculture, raayyaa ittisa fi nagenya/defence force and peace and kanbiraa/others were identified and used. We have deployed preprocessing algorithms; tokenization(number, symbol and punctuation removal), normalization, stopword removal and stemming. After cleaning of noisy data is done, we have extracted N-gram and Hybrid N-gram features using N-gram feature extraction method. The classification algorithms are then, trained using 75% and tested using 25% of the dataset. The selected classification algorithms are used to predict the category of the new news document into one of predefined news class. Classifier performance is evaluated using precision, recall, F-measure and accuracy metrics. The performance evaluation result shows that SVM with hybrid-Uni-Bi-gram(1,2) achieved the best accuracy of 92%. In this study we have selected SVM learning model with Hybrid-uni-bi-gram(1,2) model for future works based on the result we have got. The result we have got is encouraging, but if we increase and balance the dataset, we will get much better results. For future work classifying news texts using semantic relationship of phrases by observing synonym phrases en_US
dc.description.sponsorship Haramay University en_US
dc.language.iso en en_US
dc.publisher Haramaya university en_US
dc.subject News text classification, automatic text classification, phrase-based, keyword-based, N-gram models, Hybrid N-gram, Machine Learning, Evaluation Metrics en_US
dc.title PHRASE BASED AUTOMATIC AFAAN OROMO NEWS TEXT CLASSIFICATION: A MACHINE LEARNING APPROACH en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search HU-IR System


Advanced Search

Browse

My Account