A MACHINE LEARNING APPROACH TO MULTI SCALE SENTIMENT ANALYSIS OF TIGRIGNA ONLINE POSTS

Show simple item record

dc.contributor.author Samuel, Hagazi
dc.contributor.author Assabie, (Ph.D.) Yaregal
dc.contributor.author Gebremariam, (MSc) Akubazgi
dc.date.accessioned 2022-02-16T06:38:04Z
dc.date.available 2022-02-16T06:38:04Z
dc.date.issued 2021-09
dc.identifier.uri http://ir.haramaya.edu.et//hru/handle/123456789/4761
dc.description 114p. en_US
dc.description.abstract With the rapid growth of web technologies, individuals and organizations are increasingly using public opinions in blogs, forums, review sites, social networks, etc. for expressing their views and opinions. These reviews are very useful for service providers, manufactures and organizations in making informed decisions and improving their service. However, the huge volume of reviews on the social media grows so rapidly and becoming increasingly difficult for users to analyze and extract relevant information. Therefore, an automated sentiment analysis is needed. In this research, we presented a multiscale sentence-level sentiment analysis for Tigrigna online posts using a supervised machine learning approach. The multiscale Tigrigna sentiment analysis model classifies a given sentence into five predefined classes: very positive (2), positive (1), neutral (0), negative (-1) and very negative (-2). We have used three supervised machine-learning algorithms: Naïve Bayes (NB), Maximum Entropy (MaxEnt) and Support Vector Machine (SVM) with unigram, bigram, trigram and hybrid of unigram and bigram variants of N-gram as a feature. The proposed model contains different components like preprocessing (tokenization, normalization, stop word removal), morphological analysis (lemmatizing), feature extraction, training a machine learning algorithms, classification and evaluation of the result using evaluation metrics. For conducting the experiments, 1500 Tigrigna sentences are collected from different sources. Due to the morphological complexity of the language, preprocessing techniques have been applied in order to clean noisy data and reduce sparseness and dimensionality of the dataset. After preprocessing, the dataset is lemmatized, before it is given to training phase of the experiment. The experimental results show the SVM algorithm with unigram language model outperforms all algorithms with 71% accuracy. In conclusion, despite the language morphological complexity and lack of effective morphological analysis tools, the achieved experimental results are promising. However, we are convinced that the results could improve further with a larger, pre annotated and cleaned corpus. en_US
dc.description.sponsorship Haramaya University en_US
dc.language.iso en en_US
dc.publisher Haramaya university en_US
dc.subject Tigrigna Language; N-gram model; Multi-scale Sentiment Analysis; Maximum Entropy; Support Vector Machine; Naive Bayes en_US
dc.title A MACHINE LEARNING APPROACH TO MULTI SCALE SENTIMENT ANALYSIS OF TIGRIGNA ONLINE POSTS en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search HU-IR System


Advanced Search

Browse

My Account