Abstract:
The development of language is very linked with development of technology. So, Information
Retrieval (IR) is not being optional technology, it is something that is very important to
everybody and mandatory to use. In this Information age, information is highly needed than
anything else, but finding this important information needs system support (Toffler, 1980).
Users who look for Hadiyyisa document with Hadiyyisa query may not get the system that
searches for relevant document that satisfies their information need. From this point of view it
is essential having a system that works for Hadiyyisa Languages. Implementation of this work
helps users of Hadiyyisa Languages to find their information need without much difficulty and
it enhance the development of Hadiyyisa to grow with current information technology support.
The aspire of this study is to design and develop a prototype for Hadiyyisa text retrieval
system that organize document using indexing and searching relevant document for users
query based on Vector Space Model (VSM). Other types of documents like pictures, video and
audio are not included in the research. The development platform used to develop the
Hadiyyisa Text retrieval system is Python 2.7.2 programming language and Windows
environment. On the indexing and searching similar text pre-processing (tokenization,
normalization, stop word removing and stemming) technique is done. Then similarity
measurement technique is used to retrieve and rank relevant documents. All documents written
in Latin script are collected from Wachamo University Hadiyyisa Department and from
Hossana Teachers Education College. Those documents involve different subjects. For this
study 250 different textual Documents are used for doing the experimentation. For doing
experimentation 10 queries are prepared for evaluate the performance of the system. Those
queries are prepared subjectively by reviewing contents of each selected documents manually.
Relevance judgment is made for evaluating the system effectiveness by calculating precision
and recall. Generally, the prototype system registered the average performance 0.629
Precision and 0.777 Recall respectively. The reason being lower performance is because of
morphological inflection nature of Hadiyyisa writing system, Synonyms, Polysemous and some
terms in the document are not written in correct spelling