AMHARIC ANAPHORA RESOLUTION USING MACHINE LEARNING APPROACH: THE CASE OF INDEPENDENT AND HIDDEN ANAPHORA RESOLUTION

Show simple item record

dc.contributor.author KEDIR MOHAMMED TIGABU
dc.contributor.author Wondwossen Mulugeta (PhD)
dc.contributor.author Elias Debelo (MSc)
dc.date.accessioned 2023-11-02T06:45:17Z
dc.date.available 2023-11-02T06:45:17Z
dc.date.issued 2023-03
dc.identifier.uri http://ir.haramaya.edu.et//hru/handle/123456789/6722
dc.description 115 en_US
dc.description.abstract Anaphora resolution is a fundamental task in natural language processing that involves identifying the antecedent of an anaphoric expression in a text. The task is critical for several NLP applications, such as machine translation, text summarization, question answering, information extraction, natural language generation, discourse analysis and sentiment analysis. This research focuses on developing an effective anaphora resolution model using machine learning approach. Our model for resolving anaphora in Amharic has two main phases: training and testing. The model is made up of five components, including pre-processing, which involves tokenizartion, normalization, part-of-speech tagging and chunking, identifying independent anaphora, identifying hidden anaphora (extracting pronouns from Amharic verbs using morphological analyzer), identifying candidate antecedents (identifying noun phrases), feature selection (applying resolution factors which are the constraint rules and the preference rules), and the actual anaphora resolver model. We have used the HornMorpho for morphological analysis. While preparing the corpus or datasets by customizing for our annotation scheme, we have used INCEpTION text annotation tool. We have developed a supervised machine learning model of Amharic anaphora resolution. This study focuses only pronominal and reflexive pronouns. We have used three machine learning algorithms for classification which are: Support Vector Machine (SVM), Naïve Bayes (NB), and Random Forest (RF). To evaluate the performance of the model, we have collected 575 sentences which has 211 independent pronouns and 537 hidden pronouns for our experiment from Amharic news portals, Quran, Bible and the evaluation of the model was conducted in different scenarios. First based on the dataset type, based on the location of anaphor and antecedent finally the performance also evaluated for hidden and independent anaphora detection. In the first case the performance of the model on the compiled dataset scores an accuracy of 59.38 for SVM, 57.47 for NB and 57.91 for RF. In the second case, for inter-sentential anaphora the model scored SVM 43.87%, NB 43.20% and RF 44.52%. for intra-sentential anaphora the model scored an accuracy for SVM 62.50%, 59.44%, 57.39% for NB and RF respectively. For independent and hidden anaphora detection SVM 66.82% and for hidden anaphora 66.11% of accuracy was scored. en_US
dc.description.sponsorship Haramaya University en_US
dc.language.iso en en_US
dc.publisher Haramaya University en_US
dc.subject Amharic Language, Amharic Anaphora Resolution, Machine Learning, NLP en_US
dc.title AMHARIC ANAPHORA RESOLUTION USING MACHINE LEARNING APPROACH: THE CASE OF INDEPENDENT AND HIDDEN ANAPHORA RESOLUTION en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search HU-IR System


Advanced Search

Browse

My Account