Smart Task Scheduling for Cloud-Based Big Data Systems
Abstract
This paper presents a hybrid task scheduling framework for cloud-based big data systems that aims at three main objectives: to improve the system's performance, to decrease the expenses, and to increase the energy efficiency. The conceived system combines a rule-based decision engine with a Long Short-Term Memory (LSTM)-based resource prediction model, enabling real-time job assignment based on task urgency, data locality, and system state. The framework is at the top of Apache YARN; thus, it is compatible with batch jobs (via Hadoop/Spark) as well as streaming tasks (via Kafka/Flink). We reproduced the experiments on a 50-node cluster (n2-standard-16 instances, 16 vCPUs, 64 GB RAM), using real workloads of 100 GB–1 TB batch jobs and 1K–5K event/sec streams. Some of the metrics for evaluating the performance of the experiments are job completion time, throughput, cost per TB processed, and energy consumption (Joules/TB). The results indicate a 32–50% improvement in performance, up to 54% savings in cost when using spot instances, and a 25% reduction in energy consumption compared to baseline schedulers such as YARN, Kubernetes, and Spark.References
Abueid, Aws I. "Big Data and Cloud Computing Opportunities and Application Areas." Engineering, Technology & Applied Science Research 14, no. 3 (2024): 14509-14516.
Berisha, Blend, Endrit Mëziu, and Isak Shabani. "Big data analytics in Cloud computing: an overview." Journal of Cloud Computing 11, no. 1 (2022): 24.
Zhang, Guo. "Cloud computing convergence: integrating computer applications and information management for enhanced efficiency." Frontiers in Big Data 8 (2025): 1508087.
Buyya, Rajkumar, Kotagiri Ramamohanarao, Chris Leckie, Rodrigo N. Calheiros, Amir Vahid Dastjerdi, and Steve Versteeg. "Big data analytics-enhanced cloud computing: Challenges, architectural elements, and future directions." In 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS), pp. 75-84. IEEE, 2015.
Khan, Imran. "A study of big data in cloud computing." Computer Assisted Methods in Engineering and Science 31, no. 3 (2024).
Huang, Siqi, Zhenqiang Xie, Jiaxiang Wang, Penghui Lv, and Wenrong Wang. "Design and implementation of big data processing system based on Hadoop." Procedia Computer Science 259 (2025): 1115-1122.
Arif, Zeravan, and Subhi RM Zeebaree. "Distributed Systems for Data-Intensive Computing in Cloud Environments: A Review of Big Data Analytics and Data Management." The Indonesian Journal of Computer Science 13, no. 2 (2024).
Zhu, Wenbo. "Optimizing distributed networking with big data scheduling and cloud computing." In International Conference on Cloud Computing, Internet of Things, and Computer Applications (CICA 2022), vol. 12303, pp. 23-28. SPIE, 2022.
Dai, Fei, Md Akbar Hossain, and Yi Wang. "State of the art in parallel and distributed systems: Emerging trends and challenges." Electronics 14, no. 4 (2025): 677.
Arif, Zeravan, and Subhi RM Zeebaree. "Distributed Systems for Data-Intensive Computing in Cloud Environments: A Review of Big Data Analytics and Data Management." The Indonesian Journal of Computer Science 13, no. 2 (2024).
Ilager, Shashikant, Rajeev Muralidhar, and Rajkumar Buyya. "Artificial intelligence (ai)-centric management of resources in modern distributed computing systems." In 2020 IEEE Cloud Summit, pp. 1-10. IEEE, 2020.
Tuli, Shreshth, Fatemeh Mirhakimi, Samodha Pallewatta, Syed Zawad, Giuliano Casale, Bahman Javadi, Feng Yan, Rajkumar Buyya, and Nicholas R. Jennings. "AI augmented Edge and Fog computing: Trends and challenges." Journal of Network and Computer Applications 216 (2023): 103648.
Singh, Sukhpreet, and Jaspreet Kaur. "Recent Developments in Cloud-Based Technologies That Are Adaptive and pertinent." Advancements in Cloud-Based Intelligent Informative Engineering (2025): 95-114.
Tuli, Shreshth, Redowan Mahmud, Shikhar Tuli, and Rajkumar Buyya. "Fogbus: A blockchain-based lightweight framework for edge and fog computing." Journal of Systems and Software 154 (2019): 22-36.
Perera, Niranda, Arup Kumar Sarker, Kaiying Shan, Alex Fetea, Supun Kamburugamuve, Thejaka Amila Kanewala, Chathura Widanage et al. "Supercharging distributed computing environments for high-performance data engineering." Frontiers in High Performance Computing 2 (2024): 1384619.
DOI:
https://doi.org/10.31449/inf.v49i28.10530Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







