Automatic Selection of Bitmap Join Indexes in Data Warehouses Using CFPGrowth++ Algorithm
Abstract
In the context of complex data warehousing, Typically, the analysis and decision-making process for Data Warehouses schematized in a relational star model is conducted through OLAP (On-Line Analytical Processing) queries. These queries are generally complex, characterized by several operations of selections, joins, grouping and aggregations on voluminous tables. Which requires a lot of computing time and therefore a very high response time. The cost of running OLAP decision queries on large tables is very high. The reduction of this cost becomes essential to allow decision-makers to interact within a reasonable time frame. The objective of this study is to enhance system performance by minimizing the response time of OLAP decision-making queries. The approach proposed in this article aims to search for frequent patterns for the automatic selection of binary join indexes used for reducing the execution costs of OLAP decision-making queries. To automatically generate the configuration of binary join indexes minimizing response time, an implementation of the CFPGrowth++ frequent pattern matching algorithm was well carried out and then applied to a load of queries on a test Data Warehouse created using the Analytical Processing Benchmark 1 (ABP-1) test bench, in order to validate our approach. The results of the experiment indicate that the index configuration produced by the proposed approach leads to a significant improvement in performance improvement of approximately 75%. We note that for a large portion of the load, execution time is significantly improved after applying our approach. The overall query execution time decreased compared to the general context. The overall execution time for queries decreased from 20,032.57 seconds before the application of our approach to 5,388.49 seconds after applying our approach. The experiments carried out show that the index configuration generated by the proposed approach allows a very performance gain.References
REFERENCES
A. Vaisman, E. Zimányi, ‘Data Warehouse Systems - Design and Implementation’. Data-Centric Systems and Applications. Springer, 2014.
I. Kovacic, G. Christoph Schuetz, B. Neumayr, M. Schrefl, ‘OLAP Patterns: A pattern-based approach to multidimensional data analysis’, Data & Knowledge Engineering, Volume 138, 2022.
S. Chaudhuri, U. Dayal, Narasayya, V., ‘An overview of business intelligence technology’. Commun. ACM 54(8), 88–98, 2011.
A. Cuzzocrea, ’Evolving OLAP and BI towards Complex, High-Performance BigOLAP-Data-Cube-Processing Analytics Frameworks: How to
Speed-Up Large-Scale, High-Dimensional Queries over Clouds’, Procedia Computer Science 246 4169–4175, 2024.
A. B. Charef, A. Benameur, Towards NoSQL-based Data Warehouse Solution Integrating ECDIS
for Maritime Navigation Decision Support System’, Informatica 45 415–431, 2021. https://doi.org/10.31449/inf.v45i3.3204
I. A. Najm, J. M. Dahr, A. K. Hamoud, A. S. Alasady, W. A. Awadh, M. B. M. Kamel, A. M. Humadi, ’OLAP Mining with Educational Data Mart to Predict Students Performance’, Informatica 46 11–19, 2022.
H. Inmon, ‘Building the data warehouse’. John Wiley & sons, 2005. https://doi.org/10.31449/inf.v46i5.3853
R. Kimball, M. Ross, ‘The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence’, John Wiley & Sons, 2010.
D. M. Mosquera, R. Navarrete, S. L. Mora, L. Recalde, A. A. Cabrera, ’Integrating OLAP with NoSQL Databases in Big Data Environments: Systematic Mapping’, Big Data and Cognitive Computing, 8, 64, 2024.
N. Dedic, C. Stanier, ‘An evaluation of the challenges of multilingualismin data warehouse development’. In ICEIS 2016, Proceedings of the 18th International Conference on Enterprise Information Systems, Vol. 1, Rome, Italy, 196–206, 2016.
S. Roy, S. Raj, T. Chakraborty, A. Chakrabarty, A. Cortesi, S. Sen, ’Efficient OLAP query processing across cuboids in distributed data warehousing environment’, Expert Systems with Applications Volume 239, 2024.
S. Chaudhuri, V. Narasayya, ‘Self-tuning database systems: A decade of progress’. In Proceedings of the International Conference on Very Large Databases, 3–14, 2007.
R. Kimball, M. Ross, ‘The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling’, John Wiley & Sons , 2013.
M. Yahyaoui, S. Amjad, L. Benameur. I. Jellouli, ‘Efficient of bitmap join indexes for optimising star join queries in relational data warehouses’, Int. J.Computational Intelligence Studies, Vol. 9, No. 3, pp.220–233, 2020.
R. Strohm, ‘Oracle Database Concepts, 11g Release 1 (11.1)’ B28318-03, Octobre 2007.
D. Zhang, ‘B Trees’, Chapter 15 of Handbook of Data Structures and Applications, D. P. Mehta, S. Sahni (editors), Chapman & Hall/CRC, 2004.
RaslanKain ‘The index selection problem with configurations and memory limitation: A scatter search approach’, Computers & Operations Research, Volume 133, 2021.
D. Comer, ‘The difficulty of optimum index selection’. ACM Transactions on Database Systems, 3 (4), 440–445, 1978.
K. Stockinger, K. Wu, ‘Bitmap Indices for Data Warehouses, Data Warehouses and OLAP’, R. Wrembel and C. Koncilia, eds., IRM Press, 157-178, 2006.
S. Chauhuri, Datar, M. Narasayya, V. R. (2004). ‘Index selection for databases: a hardness study and a principled heuristic solution’. IEEE Transactions Knowledge on Data Engineering, Volume 16, Issue 11, Novombre 2004.
A. Rakesh, S. Ramakrishnan, ‘Fast Algorithms for Mining Association Rules’, International Conference on Very Large Databases, pp. 487-499, September 1994.
A. Netz, S. Chaudhuri, J. Bernhardt, U. Fayyad, ‘Integration of Data Mining and Relational Databases’, International Conference on Very Large Data Bases, pp. 719-722, September 2000.
M. Frank, E. Omiecinski, S. Navathe, ‘Adaptive and automated index selection in RDBMS’. Advances in Database Technology EDBT '92. 1992.
S. Agrawal, S. Chaudhuri, V. Narasayya, ‘Automated selection of materialized views and indexes in SQL databases’, proc. of VLDB conf, p.59, 2000.
A. Yishai, Feldman, R. Jacob, ‘A knowledge-based approach for index selection in relational databases’, Expert Systems with Applications, Volume 25, Issue 1, Pages 15-37, 2003.
M. Golfarelli, S. Rizzi, E. Saltarelli, ‘Index selection for data warehousing. Proceedings 4th International Workshop on Design and Management of Data Warehouses (DMDW'2002), Toronto, Canada, pp. 33-42, 2002.
Y. Zhang, M. Su, F. Wang, H. Chen, ‘HG-Bitmap Join Index: A Hybrid GPU/CPU Bitmap Join Index Mechanism for OLAP’, WISE Workshops, Vol. 8182 of Lecture Notes in Computer Science. Springer, 23-36, 2013.
R.U. Kiran, P.K. Reddy, ‘Novel Techniques to Reduce Search Space in Multiple Minimum Supports-Based Frequent Pattern Mining Algorithms’, EDBT/ICDT '11 21 March 2011.
H. Ya-Han, C. Yen-Liang, ’Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism’, Decision Support Systems, Volume 42, Issue 1, 2006.
DOI:
https://doi.org/10.31449/inf.v49i27.7807Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







