Smart Task Scheduling for Cloud-Based Big Data Systems
Abstract
This paper presents a hybrid task scheduling framework for cloud-based big data systems that aims at three main objectives: to improve the system's performance, to decrease the expenses, and to increase the energy efficiency. The conceived system combines a rule-based decision engine with a Long Short-Term Memory (LSTM)-based resource prediction model, enabling real-time job assignment based on task urgency, data locality, and system state. The framework is at the top of Apache YARN; thus, it is compatible with batch jobs (via Hadoop/Spark) as well as streaming tasks (via Kafka/Flink). We reproduced the experiments on a 50-node cluster (n2-standard-16 instances, 16 vCPUs, 64 GB RAM), using real workloads of 100 GB–1 TB batch jobs and 1K–5K event/sec streams. Some of the metrics for evaluating the performance of the experiments are job completion time, throughput, cost per TB processed, and energy consumption (Joules/TB). The results indicate a 32–50% improvement in performance, up to 54% savings in cost when using spot instances, and a 25% reduction in energy consumption compared to baseline schedulers such as YARN, Kubernetes, and Spark.References
Abueid, Aws I. "Big Data and Cloud Computing Opportunities and Application Areas." Engineering, Technology & Applied Science Research 14, no. 3 (2024): 14509-14516.
Berisha, Blend, Endrit Mëziu, and Isak Shabani. "Big data analytics in Cloud computing: an overview." Journal of Cloud Computing 11, no. 1 (2022): 24.
Zhang, Guo. "Cloud computing convergence: integrating computer applications and information management for enhanced efficiency." Frontiers in Big Data 8 (2025): 1508087.
Buyya, Rajkumar, Kotagiri Ramamohanarao, Chris Leckie, Rodrigo N. Calheiros, Amir Vahid Dastjerdi, and Steve Versteeg. "Big data analytics-enhanced cloud computing: Challenges, architectural elements, and future directions." In 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS), pp. 75-84. IEEE, 2015.
Khan, Imran. "A study of big data in cloud computing." Computer Assisted Methods in Engineering and Science 31, no. 3 (2024).
Huang, Siqi, Zhenqiang Xie, Jiaxiang Wang, Penghui Lv, and Wenrong Wang. "Design and implementation of big data processing system based on Hadoop." Procedia Computer Science 259 (2025): 1115-1122.
Arif, Zeravan, and Subhi RM Zeebaree. "Distributed Systems for Data-Intensive Computing in Cloud Environments: A Review of Big Data Analytics and Data Management." The Indonesian Journal of Computer Science 13, no. 2 (2024).
Zhu, Wenbo. "Optimizing distributed networking with big data scheduling and cloud computing." In International Conference on Cloud Computing, Internet of Things, and Computer Applications (CICA 2022), vol. 12303, pp. 23-28. SPIE, 2022.
Dai, Fei, Md Akbar Hossain, and Yi Wang. "State of the art in parallel and distributed systems: Emerging trends and challenges." Electronics 14, no. 4 (2025): 677.
Arif, Zeravan, and Subhi RM Zeebaree. "Distributed Systems for Data-Intensive Computing in Cloud Environments: A Review of Big Data Analytics and Data Management." The Indonesian Journal of Computer Science 13, no. 2 (2024).
Ilager, Shashikant, Rajeev Muralidhar, and Rajkumar Buyya. "Artificial intelligence (ai)-centric management of resources in modern distributed computing systems." In 2020 IEEE Cloud Summit, pp. 1-10. IEEE, 2020.
Tuli, Shreshth, Fatemeh Mirhakimi, Samodha Pallewatta, Syed Zawad, Giuliano Casale, Bahman Javadi, Feng Yan, Rajkumar Buyya, and Nicholas R. Jennings. "AI augmented Edge and Fog computing: Trends and challenges." Journal of Network and Computer Applications 216 (2023): 103648.
Singh, Sukhpreet, and Jaspreet Kaur. "Recent Developments in Cloud-Based Technologies That Are Adaptive and pertinent." Advancements in Cloud-Based Intelligent Informative Engineering (2025): 95-114.
Tuli, Shreshth, Redowan Mahmud, Shikhar Tuli, and Rajkumar Buyya. "Fogbus: A blockchain-based lightweight framework for edge and fog computing." Journal of Systems and Software 154 (2019): 22-36.
Perera, Niranda, Arup Kumar Sarker, Kaiying Shan, Alex Fetea, Supun Kamburugamuve, Thejaka Amila Kanewala, Chathura Widanage et al. "Supercharging distributed computing environments for high-performance data engineering." Frontiers in High Performance Computing 2 (2024): 1384619.
DOI:
https://doi.org/10.31449/inf.v49i28.10530Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







