MACHINE LEARNING BASED MULTI-SCALE SENTIMENT ANALYSIS FOR AFAAN OROMO POSTS

Kebede, Tadesse

MACHINE LEARNING BASED MULTI-SCALE SENTIMENT ANALYSIS FOR AFAAN OROMO POSTS

Kebede, Tadesse

URI: http://localhost:8080/xmlui/handle/123456789/3577

Date: 2019-08

Abstract:

In daily decision-making activities, individuals or organizations regularly take other people’s sentiments or opinions as one source of information. To be more precise about the opinion of the people, it is crucial to consider the strength of polarity of the sentiments. Nowadays the proliferation of the internet as websites, blogs, social networks, online portals and content sharing services contributes enormous amount of user generated Afaan Oromo texts. Even though the rising usage of Afaan Oromo language on the internet, there is no sufficient sentiment assortment and cataloging method for the language. Therefore, the multi-scale sentiment analysis task which enable to automatically extract sentiments by considering strangeness of sentiment in account are indeed desirable in various applications. This work can play significant role in sustaining these desires. In this study, multi-scale sentiment analysis model for Afaan Oromo text is proposed by using bag-of-words feature representation with three supervised machine-learning algorithms: Naïve Bayes (NB), Support Vector Machine (SVM), and Maximum Entropy (MaxEnt). The Afaan Oromo multi-scale sentiment analysis process involves categorizing a sentence into five predefined classes such as strong positive (+2), strong negative (-2), positive (+1), negative (-1) and neutral (0). The proposed system contains different components like data collection, preprocessing (tokenization, normalization, stop word removal), morphological analysis (part of speech tagging, stemming), sentiment annotation, feature extraction/selection, training a machine learning algorithms, classification and evaluation of the result using evaluation metrics such as accuracy, precision, recall and f-measure. For conducting the exiperiments 1000 Afaan Oromo sentences with sentiment are collected from different sources. In addition to this, 350 stop word lists, 254 suffixs, 740 gazetteers of Afaan Oromo adjectives and 125 intensifiers (adverbs) are prepared with the assistance of Afaan Oromo language experts. The experimental results shows that performance of the system model is encouraging achieving accuracy of 74.6% for NB classifier using 1000 sentences, and using 1200 sentences the system achieved accuracy of 83% for SVM classifier. However, further research work such as named entity recognition, word position and negation features, explicit and comparative sentiment analysis, standard corpus preparation, co-reference resolution and feature or aspect level sentiment analysis are needed to develop a full-fledged and a more efficient multi-scale sentiment analysis for Afaan Oromo