A MACHINE LEARNING APPROACH TO MULTI SCALE SENTIMENT ANALYSIS OF TIGRIGNA  ONLINE POSTS

Samuel, Hagazi; Assabie, (Ph.D.) Yaregal; Gebremariam, (MSc) Akubazgi

A MACHINE LEARNING APPROACH TO MULTI SCALE SENTIMENT ANALYSIS OF TIGRIGNA ONLINE POSTS

Samuel, Hagazi; Assabie, (Ph.D.) Yaregal; Gebremariam, (MSc) Akubazgi

URI: http://ir.haramaya.edu.et//hru/handle/123456789/4761

Date: 2021-09

Abstract:

With the rapid growth of web technologies, individuals and organizations are increasingly using public opinions in blogs, forums, review sites, social networks, etc. for expressing their views and opinions. These reviews are very useful for service providers, manufactures and organizations in making informed decisions and improving their service. However, the huge volume of reviews on the social media grows so rapidly and becoming increasingly difficult for users to analyze and extract relevant information. Therefore, an automated sentiment analysis is needed. In this research, we presented a multiscale sentence-level sentiment analysis for Tigrigna online posts using a supervised machine learning approach. The multiscale Tigrigna sentiment analysis model classifies a given sentence into five predefined classes: very positive (2), positive (1), neutral (0), negative (-1) and very negative (-2). We have used three supervised machine-learning algorithms: Naïve Bayes (NB), Maximum Entropy (MaxEnt) and Support Vector Machine (SVM) with unigram, bigram, trigram and hybrid of unigram and bigram variants of N-gram as a feature. The proposed model contains different components like preprocessing (tokenization, normalization, stop word removal), morphological analysis (lemmatizing), feature extraction, training a machine learning algorithms, classification and evaluation of the result using evaluation metrics. For conducting the experiments, 1500 Tigrigna sentences are collected from different sources. Due to the morphological complexity of the language, preprocessing techniques have been applied in order to clean noisy data and reduce sparseness and dimensionality of the dataset. After preprocessing, the dataset is lemmatized, before it is given to training phase of the experiment. The experimental results show the SVM algorithm with unigram language model outperforms all algorithms with 71% accuracy. In conclusion, despite the language morphological complexity and lack of effective morphological analysis tools, the achieved experimental results are promising. However, we are convinced that the results could improve further with a larger, pre annotated and cleaned corpus.