ENHANCING EFFECTIVENESS OF AFAAN OROMO INFORMATION RETRIEVAL USING LATENT SEMANTIC INDEXING AND DOCUMENT CLUSTERING BASED SEARCHING

Show simple item record

dc.contributor.author Bogale, Belete
dc.date.accessioned 2021-06-11T03:55:36Z
dc.date.available 2021-06-11T03:55:36Z
dc.date.issued 2020-09
dc.identifier.uri http://localhost:8080/xmlui/handle/123456789/3750
dc.description 108p. en_US
dc.description.abstract This research work comes up with Latent Semantic Indexing and Document Clustering based searching for Afaan Oromo documents. It intends to apply LSI and K-means clustering to handle the semantic structure of words in documents. This mainly consists of three components; indexing, clustering, and searching. Latent Semantic Indexing (LSI) model is a concept based retrieval method that exploits the idea of a vector space model and singular value decomposition. On the other hand, document clustering was investigated for improving the performance of information retrieval system. Document clustering is an issue of measuring similarity between documents and grouping similar documents together. K-means clustering was used to cluster the document using the Singular Value Decomposition (SVD) matrix. Then, the retrieval process is further refined by making a similarity measure between the query vector and cluster centroid vectors. IR pre-processing for tokenization, normalization, stop word removal, and stems were used for selecting index and query terms. Finally, a comparison is made between the SVD model with K-means clustering, VSM and SVD model. The performance evaluation of the system was performed by using a selected set of documents and queries. The experimental result showed that the proposed prototype registered on average 70% recall, 80% precision, and 72% F-measure. Therefore, it indicated that the proposed method (SVD with Kmeans) achieved significant improvement compared to the VSM and SVD model. Nevertheless, the performance of the system is greatly affected by the statistical extraction of synonyms and polysemy, mis-clustering, standard corpus, and stemming which need further research. en_US
dc.description.sponsorship Haramaya University en_US
dc.language.iso en en_US
dc.publisher Haramaya university en_US
dc.title ENHANCING EFFECTIVENESS OF AFAAN OROMO INFORMATION RETRIEVAL USING LATENT SEMANTIC INDEXING AND DOCUMENT CLUSTERING BASED SEARCHING en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search HU-IR System


Advanced Search

Browse

My Account