Abstract:
Semantic Role Labeler (SRL) is the one of the task in Natural Language Processing, which
identifies semantic arguments of predicates and labels them with their semantic role in a given
sentence. Different researches have been done on semantic role labeling especially for developed
languages like English and Chinese. Those researchers have used different algorithms and different
corpus as a dataset for executing their algorithms. Well-developed semantic role labeler increases
the performance and effectiveness of different natural language processing applications such as
machine translation, information extraction, question answering and others. Even though Afaan
Oromoo is one of the languages, which is used by a large population, there is no research work
done on this semantic role labeling for the language. For the given predicate, there are different
participants in a sentence for making this predicate more meaningful. There is also the relationship
between the given predicate of the sentence and other arguments. This relationship should be
persistent whenever the arrangement of those arguments and predicates is changed. The sequence
of predicates and other arguments is understandable by human beings, but it is difficult for
machines in order to capture the similarities and differences in a meaning of verbs reflected in the
argument. The goal of this study is to do semantic role labeling using a supervised machine
learning system that draws inferences from input data without labeled answers. The proposed
system has different tasks like data preprocessing, morphological analysis, semantic role
annotation, feature extraction, training and classification. To make sure that the developed model
is good enough for classification of an arguments of the predicates, basic evaluation metrics such
as accuracy, precision, recall, and F-measure have been used in the study. The experiment of the
study is done using Support Vector Machine, Decision Tree, and Naïve Bayes classification
algorithms. Afaan Oromoo propositional bank (AOPropBank), which has a detail of each
arguments of the sentences and their proper role, is developed from 1400 Afaan Oromoo sentences
collected from newspapers, social medias, and blogs. The result obtained from our experiment
shows that the performance of the developed model using NB, SVM, and DT classification
algorithms achieved an accuracy of 75%, 76.63%, and 70.25% respectively. However, in order to
increase the obtained results, further research work such as named entity recognition, word sense
disambiguation are needed to be conducted