AMHARIC ANAPHORA RESOLUTION USING MACHINE  LEARNING APPROACH: THE CASE OF INDEPENDENT AND  HIDDEN ANAPHORA RESOLUTION

KEDIR MOHAMMED TIGABU; Wondwossen Mulugeta (PhD); Elias Debelo (MSc)

AMHARIC ANAPHORA RESOLUTION USING MACHINE LEARNING APPROACH: THE CASE OF INDEPENDENT AND HIDDEN ANAPHORA RESOLUTION

KEDIR MOHAMMED TIGABU; Wondwossen Mulugeta (PhD); Elias Debelo (MSc)

URI: http://ir.haramaya.edu.et//hru/handle/123456789/6722

Date: 2023-03

Abstract:

Anaphora resolution is a fundamental task in natural language processing that involves identifying the antecedent of an anaphoric expression in a text. The task is critical for several NLP applications, such as machine translation, text summarization, question answering, information extraction, natural language generation, discourse analysis and sentiment analysis. This research focuses on developing an effective anaphora resolution model using machine learning approach. Our model for resolving anaphora in Amharic has two main phases: training and testing. The model is made up of five components, including pre-processing, which involves tokenizartion, normalization, part-of-speech tagging and chunking, identifying independent anaphora, identifying hidden anaphora (extracting pronouns from Amharic verbs using morphological analyzer), identifying candidate antecedents (identifying noun phrases), feature selection (applying resolution factors which are the constraint rules and the preference rules), and the actual anaphora resolver model. We have used the HornMorpho for morphological analysis. While preparing the corpus or datasets by customizing for our annotation scheme, we have used INCEpTION text annotation tool. We have developed a supervised machine learning model of Amharic anaphora resolution. This study focuses only pronominal and reflexive pronouns. We have used three machine learning algorithms for classification which are: Support Vector Machine (SVM), Naïve Bayes (NB), and Random Forest (RF). To evaluate the performance of the model, we have collected 575 sentences which has 211 independent pronouns and 537 hidden pronouns for our experiment from Amharic news portals, Quran, Bible and the evaluation of the model was conducted in different scenarios. First based on the dataset type, based on the location of anaphor and antecedent finally the performance also evaluated for hidden and independent anaphora detection. In the first case the performance of the model on the compiled dataset scores an accuracy of 59.38 for SVM, 57.47 for NB and 57.91 for RF. In the second case, for inter-sentential anaphora the model scored SVM 43.87%, NB 43.20% and RF 44.52%. for intra-sentential anaphora the model scored an accuracy for SVM 62.50%, 59.44%, 57.39% for NB and RF respectively. For independent and hidden anaphora detection SVM 66.82% and for hidden anaphora 66.11% of accuracy was scored.