HyScaleFlow: An ML-Driven DAG-Based Orchestration Framework for Real-Time Stream Processing in Hybrid Cloud Environments
Abstract
The increasing complexity of real-time data processing across hybrid cloud and edge environments has revealed significant limitations in existing distributed stream processing systems. While frameworks like Apache Spark and Flink offer strong scalability and performance, they lack the orchestration intelligence required to adapt to dynamic workloads, anticipate failures, and optimize resource usage in heterogeneous environments. Traditional rule-based or reactive orchestration approaches fail to deliver the responsiveness and fault resilience needed for mission-critical applications in domains such as IoT analytics, innovative infrastructure, and cyber-physical systems. To address these challenges, this paper presents HyScaleFlow, a scalable and modular framework that integrates real-time stream processing with machine learning–driven orchestration. The architecture combines Apache Spark (at the edge) and Apache Flink (in the cloud) with a hybrid DAG-based orchestration strategy using Apache Airflow and Dagster. A key innovation is the FlowGuard module, which uses XGBoost models (classifier and regressor) to predict node failures and forecast resource load based on Prometheus-exported telemetry metrics. These predictions dynamically inform DAG execution, enabling preemptive scaling, container migration, and workload-aware task routing. Evaluations were conducted using the NYC Taxi Trip dataset (over 1.1 billion records) on a hybrid cloud testbed that combines Spark at the edge and Flink in the cloud, orchestrated via Docker/Kubernetes. Results reveal that HyScaleFlow improves DAG completion rates by 16.8%, reduces task retry rates by over 60%, and enhances fault recovery times by up to 40%. Additionally, the framework achieves a 19.5% reduction in cloud execution cost and a 35.9% gain in resource efficiency. HyScaleFlow demonstrates strong utility for real-time, data-intensive applications by unifying predictive intelligence with stream processing. It provides a replicable, cost-effective, and resilient solution for hybrid cloud data engineering, advancing the state of intelligent orchestration.DOI:
https://doi.org/10.31449/inf.v49i9.9498Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







