A CRISP-DM and Predictive Analytics Framework for Enhanced Decision-Making in Research Information Management Systems

Otmane Azeroual, Radka Nacheva, Anastasija Nikiforova, Uta Störl

Abstract


The age of digitization has led to a significant increase in the amount and variety of data, particularly within the research domain, where data previously stored in paper form has now been digitized and integrated into research management processes. The rapid growth of Big Data, driven by technologies like the Internet of Things, presents challenges for conventional data processing methods. However, data alone, stored in silos, lacks value. To unlock its potential, data must be analysed and processed to generate insights and predictions that enable evidence-based decision-making. Predictive Analytics (PA) is a powerful tool for this purpose. By leveraging PA and advanced statistical methods, predictive models for research management can be developed, helping to forecast research trends and outcomes, which in turn, provides decision-makers with a reliable, forward-looking basis for strategic decisions in research management. This paper explores the application of PA in Current Research Information Systems (CRIS) to enhance decision-making. A case study using metadata from 20,000 publications indexed in Scopus demonstrates how PA can identify emerging research topics and predict future trends. Machine learning algorithms such as Support Vector Machine (SVM), k-Nearest Neighbor (kNN), Random Forest, and Tree classifiers were employed, with metrics such as Area Under ROC (AUC), classification accuracy (CA), F1-score, precision, and recall evaluated. The results indicate that the kNN algorithm provided the highest performance with an AUC of 0.451 and a classification accuracy of 87.4%. These results show that predictive models can reveal significant patterns in research data, supporting data-driven decisionmaking for research management. Additionally, the study applied Latent Semantic Indexing (LSI) and clustering techniques to identify and categorize key topics within the data, showing a thematic focus on areas such as smart cities and urban intelligence before predictions, and CRIS applications after predictions. The findings illustrate how PA can optimize research management by identifying gaps in research and forecasting emerging topics, thereby aiding institutions in making more informed, evidencebased decisions

Full Text:

PDF

References


(2000). CRISP-DM 1.0 – Step-by-step data mining guide. SPSS Inc.

Akter, S., & Wamba, S. F. (2016). Big data analytics in E-commerce: a systematic review and agenda for future research. Electronic Markets, 26(2):173–194. https://doi.org/10.1007/s12525-016-0219-0

Al Sadi, I. M. S. (2021). Open access analytics with open access repository data: A Multi-level perspective (Doctoral dissertation, University of Southampton).

Azeroual, O. (2019). Text and Data Quality Mining in CRIS. Information, 10(12):374. https://doi.org/10.3390/info10120374.

Azeroual, O., Nacheva, R., Nikiforova, A., Störl, U., & Fraisse, A. (2023). Predictive Analytics intelligent decision-making framework and testing it through sentiment analysis on Twitter data. In Proceedings of the 24th International Conference on Computer Systems and Technologies (pp. 42-53).

Azeroual, O., Nikiforova, A. and Sha, K., 2023, June. Overlooked Aspects of Data Governance: Workflow Framework For Enterprise Data Deduplication. In 2023 International Conference on Intelligent Computing, Communication, Networking and Services (ICCNS) (pp. 65-73). IEEE.

Azeroual, O.; Schöpfel, J.; Pölönen, J. and Nikiforova, A. (2022). Putting FAIR Principles in the Context of Research Information: FAIRness for CRIS and CRIS for FAIRness. In Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KMIS, pages 63–71. https://doi.org/10.5220/0011548700003335

Beulen, E., & Dans, M. A. (2023). Data Analytics and Digital Transformation. Taylor & Francis.

Bibri, S. E. (2021). Data-driven smart sustainable cities of the future: An evidence synthesis approach to a comprehensive state-of-the-art literature review. Sustainable Futures, 3, 100047. https://doi.org/10.1016/j.sftr.2021.100047

Burow, L.; Gerards, Y.; Demmer, M. (2017). Effektiv und effizient steuern mit

Chapman, P.; Clinton, J.; Kerber, R.; Khabaza, T.; Reinartz, T.; Shearer, C.; Wirth, R.

Chen, C. P.; Zhang, C. Y. (2014). Data-intensive applications, challenges, techniques, and technologies: A survey on Big Data. Information sciences, 275, 314-347. https://doi.org/10.1016/j.ins.2014.01.015

Chu, M. K., & Yong, K. O. (2021). Big data analytics for business intelligence in accounting and audit. Open Journal of Social Sciences, 9(9), 42–52. https://doi.org/10.4236/jss.2021.99004

Clements, A.; Proven, J., 2015. The emerging role of institutional CRIS in facilitating open scholarship. In: LIBER Annual Conference 2015, London, June 25th, 2015. https://dspace-cris.eurocris.org/handle/11366/393

Dutt, S., Chandramouli, S., Das, A. (2019). Machine Learning. Pearson.

Eckerson, W. W. (2007). Predictive Analytics: Extending the Value of Your Data

Eisenhardt, K.M.; Zbaracki, M.J. (1992). Strategic Decision Making. Strategic Management Journal, 13,17–37. https://www.jstor.org/stable/2486364

Elsevier. (2023). Why you need a Research Information Management System (RIMS). [Online] Available at: https://www.elsevier.com/research-intelligence/rims-and-cris-systems. [Accessed 2 July 2023]

euroCRIS. (2020). Why does one need a CRIS? The Research Process and how a CRIS can support it. [Online] Available at: https://eurocris.org/why-does-one-need-cris. [Accessed 2 July 2023]

Fraumeni, B. M. (2001). E-commerce: Measurement and measurement issues. American Economic Review, 91(2), 318–322. https://www.jstor.org/stable/2677781

Frizzo-Barker, J., Chow-White, P. A., Adams, P. R., Mentanko, J., Ha, D., & Green, S. (2020). Blockchain as a disruptive technology for business: A systematic review. International Journal of Information Management, 51, 102029. https://doi.org/10.1016/j.ijinfomgt.2019.10.014

Gartner. (2023). Small And Midsize Business (SMB). [Online] Available at: https://www.gartner.com/en/information-technology/glossary/smbs-small-and-midsize-businesses. [Accessed 2 July 2023]

Grover, V., Chiang, R. H., Liang, T. P., & Zhang, D. (2018). Creating strategic business value from big data analytics: A research framework. Journal of management information systems, 35(2), 388-423. https://doi.org/10.1080/07421222.2018.1451951

Halper, F. (2014). Predictive analytics for business advantage. TDWI Research, 1-32.

http://hdl.handle.net/11366/1015

https://doi.org/10.1007/s12176-017-0122-3

https://doi.org/10.1016/j.jbusres.2016.08.001

https://www.jstor.org/stable/1503543

Hüther, O., & Krücken, G. (2016). Nested organizational fields: Isomorphism and differentiation among European universities. The University Under Pressure (Research in the Sociology of Organizations, Vol. 46), Emerald Group Publishing Limited, Bingley, pp. 53–83. https://doi.org/10.1108/S0733-558X20160000046003

Jeffery, K., 2012. CRIS in 2020. In: CRIS2012: 11th International Conference on Current Research Information Systems (Prague, June 6–9, 2012). http://dspacecris.eurocris.org/ handle/11366/119

Jetten, M.; Simons, E. (2019). Research data management incorporated in a Research Information Management system. A case study on archiving data sets and writing Data Management Plans at Radboud University, the Netherlands.EUNIS19: 25th EUNIS Annual Congress (June 5-7, 2019, NTNU, Trondheim, Norway).

Kelley, K.; Clark, B.; Brown, V.; Sitzia, J. (2003). Good practice in the conduct and reporting of survey research. International Journal for Quality in Health Care, 15(3): 261–266. https://doi.org/10.1093/intqhc/mzg031

Kim, SW., Gil, JM. (2019). Research paper classification systems based on TF-IDF and LDA schemes. Hum. Cent. Comput. Inf. Sci. 9, 30. https://doi.org/10.1186/s13673-019-0192-7.

Kotu, V., & Deshpande, B. (2014). Predictive analytics and data mining: concepts and practice with rapidminer. Morgan Kaufmann.

Krüger, A. K., & Petersohn, S. (2022). From Research Evaluation to Research Analytics. The digitization of academic performance measurement. Valuation Studies, 9(1), 11-46.

Maassen, P. A. (1997). Quality in European higher education: Recent trends and their historical roots. European Journal of education, 111–127.

Marr, B. (2018) Here's Why Data Is Not The New Oil https://www.forbes.com/sites/bernardmarr/2018/03/05/heres-why-data-is-not-the-new-oil/?sh=1c70e5133aa9

MicroStrategy (2021). 2020 GLOBAL STATE OF ENTERPRISE ANALYTICS MINDING THE DATA-DRIVEN GAP. Online: 2020-Global-State-of-Enterprise-Analytics.pdf (microstrategy.com)

Nacheva, R. (2022). Emotions Mining Research Framework: Higher Education in the Pandemic Context. In: Terzioğlu, M.K. (eds) Advances in Econometrics, Operational Research, Data Science and Actuarial Studies. Contributions to Economics. Springer, Cham. https://doi.org/10.1007/978-3-030-85254-2_18.

Nikiforova, A. (2023). HackCodeX Forum Keynote “Data Quality as a prerequisite for you business success: when should I start taking care of it?”, https://anastasijanikiforova.com/2023/06/07/hackcodex-forum-keynote-data-quality-as-a-prerequisite-for-you-business-success-when-should-i-start-taking-care-of-it/

Paul, L. R.; Sadath, L.; Madana, A. (2021). Artificial Intelligence in Predictive Analysis of Insurance and Banking. In Artificial Intelligence (pp. 31-54). CRC Press.

Perrons, R. K., & Jensen, J. W. (2015). Data as an asset: What the oil and gas sector can learn from other industries about “Big Data”. Energy Policy, 81, 117–121. https://doi.org/10.1016/j.enpol.2015.02.020

Piryonesi, S. M., & El-Diraby, T. E. (2020). Data analytics in asset management: Cost-effective prediction of the pavement condition index. Journal of Infrastructure Systems, 26(1), 04019036. https://doi.org/10.1061/(ASCE)IS.1943-555X.0000512

Predictive Analytics. Controlling & Management Review, 61(9):48–56.

Qiu, F., et al. (2022). Predicting students’ performance in e-learning using learning process and behaviour data. Sci Rep 12, 453. https://doi.org/10.1038/s41598-021-03867-8.

Rahimi, N., Eassa, F., Elrefaei, L. (2020). An Ensemble Machine Learning Technique for Functional Requirement Classification. Symmetry, 12, 1601. https://doi.org/10.3390/sym12101601.

Rathore, A. K., Kar, A. K., & Ilavarasan, P. V. (2017). Social media analytics: Literature review and directions for future research. Decision Analysis, 14(4), 229-249.

Romeike, F.; Eicher, A. (2016). Predictive Analytics: Looking into the future. FIRM Yearbook, pp. 168–171.

Salemink, I., Dufour, S., van der STEEN, M., & Officer, S. P. (2019). Future advanced data collection. In the Conference of European Statisticians, vol. 58.

Sarker, I.H. (2021) Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective. SN COMPUT. SCI. 2, 377. https://doi.org/10.1007/s42979-021-00765-8

Schöpfel, J., Azeroual, O., & Saake, G. (2020). Implementation and user acceptance of research information systems: An empirical survey of German universities and research organisations. Data Technologies and Applications, 54(1), 1-15. https://doi.org/10.1108/DTA-01-2019-0009

Schöpfel, J.; Azeroual, O. (2021).Current research information systems and institutional repositories: From data ingestion to convergence and merger. Editor(s): David Baker, Lucy Ellis, In Chandos Digital Information Review, Future Directions in Digital Information, Chandos Publishing, pp. 19-37. https://doi.org/10.1016/B978-0-12-822144-0.00002-1.

Sivarajah, U., Kamal, M. M., Irani, Z., & Weerakkody, V. (2017). Critical analysis of Big Data challenges and analytical methods. Journal of business research, 70, 263–286.

Spencer, S. B. (2015). Privacy and predictive analytics in e-commerce. 49 New England Law Review 101, 629. https://ssrn.com/abstract=2678381

Stylos, N., & Zwiegelaar, J. (2019). Big data as a game changer: how does it shape business intelligence within a tourism and hospitality industry context? In Big data and innovation in tourism, travel, and hospitality (pp. 163-181). Springer, Singapore. https://doi.org/10.1007/978-981-13-6339-9_11

Tanlamai, J., Khern-am-nuai, W., & Adulyasak, Y. (2022). Identifying arbitrage opportunities in retail markets using predictive analytics. Available at SSRN 3764048. http://dx.doi.org/10.2139/ssrn.3764048

Tanvir, Q. (2021). Multi-Page Document Classification using Machine Learning and NLP. [Online] Available at: https://towardsdatascience.com/multi-page-document-classification-using-machine-learning-and-nlp-ba6151405c03. [Accessed 2 August 2023]

University of Ljubljana. (2023). Orange Data Mining. [Online] Available at:https://orangedatamining.com. [Accessed 2 October 2023]

Varian, H. R. (2014). Big data: New tricks for econometrics. Journal of Economic Perspectives, 28(2), 3-28. https://doi.org/10.1257/jep.28.2.3

Vu Nguyen Hai, D., & Gaedke, M. (2021, May). Applying Predictive Analytics on Research Information to Enhance Funding Discovery and Strengthen Collaboration in Project Proposals. In International Conference on Web Engineering (pp. 490-495). Cham: Springer International Publishing.

Wang, Y., Kung, L., & Byrd, T. A. (2018). Big data analytics: Understanding its capabilities and potential benefits for healthcare organizations. Technological forecasting and social change, 126, 3-13. https://doi.org/10.1016/j.techfore.2015.12.019

Warehousing. TDWI Best Practices Report, Renton.

Zazzaro, G., Mercogliano, P., & Romano, G. (2017). Data Mining for Forecasting fog Events and Comparing Geographical Sites. IARIA Int. J. Adv. Networks Serv, 10, 160-171.




DOI: https://doi.org/10.31449/inf.v49i18.5613

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.