The Impact of Online Indexing in Improving Arabic Information Retrieval Systems

Tahar Dilekh, Saber Benharzallah, Ali Behloul

Abstract


This paper suggests a new type of indexing Arabic Language text that contributes to improving the quality of IRS. The proposed method of indexing belongs to the semi-automatic category of indexing and consists of two types. The first type conducts an online indexing where one document is the indexing unit. This type of indexing refers to the indexing process that begins directly after the writing of each unit ends, which allows assisting human expert (author of the text) to select Arabic appropriate descriptors to improve the search results. The output of this process gives a rise to a Partial index. The second type – under this method- is an offline indexing, which refers to the process of indexing based on the collection of textual documents available from different corpora. The output of this process leads to a General index. We illustrate the application and the performance of this new method of indexing using an Arabic text editor developed and designed to allow for an online semi-automatic indexing system and Information Retrieval tool that contains an offline automatic indexing system. We also illustrate the process of building a new form of Arabic corpus appropriate to conduct the necessary experiments. Our findings show that the online indexing model successfully identifies the descriptors most relevant to the document, which is primarily due to the intervention of the human expert in the descriptors’ identification process. In addition, this model is more efficient as it helps to minimize index storage size, consequently, improving the response time of the different requests. Finally, the paper proposes a solution to issues and deficiencies Arabic language processing suffers from, especially regarding corpora building and information retrieval evaluation systems. This latter enables researchers to test their indexing and retrieval algorithms.

Full Text:

PDF


DOI: https://doi.org/10.31449/inf.v42i4.2297

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.