A Novel Agent Based Load Balancing Model for Maximizing Resource Utilization in Grid Computing

Grid is the collection of geographically distributed computing resources. For effective management of these resources, the manager must maximize its utilization, which can be achieved by efficient load balancing algorithm, The objective of load balancing algorithms is to assign the load on resources to optimize resource use while reducing total jobs execution time. The proposed agent based load balancing model aims to take advantage of the agent characteristics to generate an autonomous system. It also addresses similar systems drawbacks such as instability, scalability or adaptability. The performance of the proposed algorithms were tested in Alea 2 simulator by using different parameters such as response time, resources utilization and overall queue time. The performance evaluation suggests that the proposed algorithm can enhance the overall performance of grid computing.


Introduction
Due to the emergence of grid computing on the Internet, a hybrid load balancing algorithm, which takes into account various factors such as grid architecture, computer heterogeneity, communication delays, network bandwidth, resource availability, unpredictability and job characteristics, is now required.
For grids, scalability and adaptability are two major issues. As for the centralized resource scheduling problem, the limitation of scalability and computational performance is inevitable. Moreover, due to resource heterogeneity, resource variations, application diversity and grid environments are dynamic. Therefore, adaptive and robust scheduling techniques are preferred [1] [2].
Multi-agent systems offer promising features for resource managers. The reactivity, proactivity, scalability, cooperation, robustness, flexibility and autonomy that characterize agents can help in the complex task of managing resources in dynamic and changing environments. This paper presents a new Agent Based Load Balancing Algorithm, called ABLBA. A hierarchical architecture with coordination is designed to ensure scalability and efficiency. In addition, a multi-agent approach is applied to improve the adaptability. The proposed algorithm aims to reduce the average response time, as much as possible, of jobs submitted to the Grid, and to maximize throughput and resource utilization.

Related works
Authors in [3] proposed a multi-agent load balancing model by analyzing the load of compute nodes and the subsequent migration of virtual machines from overloaded nodes to underloaded nodes. The proposed system involves multiple nodes that interact to implement MapReduce jobs. The multi-agent system consists of a group of agents: node sensor agent, simulation model sensor agent, analysis agent, migration agent and distribution agent. Analysis and distribution agents are defined as reasoning agents.
In [4], a decentralized computing algorithm was proposed to assign and schedule jobs on a distributed grid. Using the properties of multi-agent systems, the proposed distributed resource allocation protocol (dRAP) is described as follows: An agent in the system is simply a node. Each agent has a vector including the number of CPUs in its cluster and the residual time to complete the execution of its current process. Each agent is assured to be in exactly 1 out of 4 cases during the simulation.
A main feature of this algorithm is that nodes ask their neighbors to form clusters. This reduces waiting time and communication costs. One optimization to consider would be to delay the disconnection of the cluster in state 4, which would guide learning or memory in the system where the planner would be able to remember the requirements of the past process. The problem with this algorithm is its decentralized nature, it is neither a centralized control nor a precise synchronization on nodes (agents).
The study in [5] presented the development of an agent-based model for managing network resources with defined operations so that the user can perform jobs efficiently and effectively and thus significantly improve management by a gLite Grid middleware. The proposed solution provides a platform based on a collection of agents in a virtual organization. The key aspects of this proposal architecture are: resource tracking, load balancing and agent hierarchy.
In [6] the authors proposed a new load balancing structure based on the moving agent and a technique for optimizing ant colonies. In the proposed structure, a dispatcher agent is involved in distributing the tasks received to the worker agents according to the right decisions to minimize the overall execution time (makespan). The proposed framework is constructed using three layers which are the producer of user tasks, the scheduling load balancing layer and the workers' layer. This study should be complemented by comparing their results with other methods, minimizing task movements and resulting in additional costs in the migration process.
Authors in [7] presented the design and implementation of a priority scheduling and fuzzy load balancing model in a computing grid. In this grid template, the user sends his jobs to the grid agent, after the grid scheduler uses the priority-based scheduling algorithm to schedule jobs from the grid agent to the available resource. Load balancing is done using the fuzzy logic technique Propose, in which a set of fuzzy rules are produced using the resource and the work parameter. As fuzzy control rules are collected using linguistic variables, perceptual knowledge and inspection are easily integrated into the control mechanism.

Proposed agent based load balancing model
A grid computing was modelled as a set of clusters.
Each cluster was composed of nodes and belonged to a LAN local domain (Local Area Network). Every cluster was connected to the WAN global network (World Area Network) by a Switch [8].
The proposed Agent Based load balancing model was based on mapping the Grid architecture into a tree structure. This tree was built by aggreGAtion as follows: first, for each cluster, a two level subtree was created. The leaves of this sub-tree correspond to the cluster nodes, and its root, called cluster manager, represents a virtual node associated with the cluster. Secondly, sub-trees corresponding to all clusters were collected to generate a three level sub-tree whose root is a virtual node designated as a Grid manager. The concluding tree is referred to as C/N, where C is the number of clusters that constitute the Grid and N the number of worker nodes [8].
This study aims to develop a hierarchical load balancing model based on a multi-agent system. There are two key challenges for Grid computing: heterogeneity and scalability. The authors propose a three-layer architecture to address the scalability issue. Connecting or disconnecting resources (worker nodes or clusters) correspond to simple operations in a tree (adding or removing leaves or sub-trees). The proposed agent based load balancing model aims to take advantage of the agent's characteristics to create an autonomous system. It also addresses similar disadvantages such as instability, scalability, adaptability, etc., and other specific issues related to grid computing.

Model characteristic
The proposed model is characterized as hierarchical; this characteristic facilitates the circulation of information through the tree and defines the flow of messages in the proposed strategy. Three types of load information movements can be identified: • Ascending movement: this movement relates to the load information movement, to get current load state. from Level 2 (node Agents) towards Level 1 (Cluster Agents). or from Level 1(Cluster Agents) towards Level 0 (grid Agents). With this movement, the cluster manager can have a global view of the cluster load or the grid manager can have a glob view of the grid load.
• Horizontal movement: it concerns the useful parameters for the execution of load balancing operations. This movement relates to task assignment intra-cluster in Level 2.
• Descending movement: this movement allows to take decisions for task assignment or jobs migration, the decisions taken by cluster Agents at levels 1 to the Migration Agents at same level. And from Migration Agents at level 1 to Node Agents at level 2, also from Grid Agent at level 0 to Cluster Agents in level 1.

The proposed model:
• supports the scalability and heterogeneity of grids: insertion or elimination entities (processing elements, nodes or clusters) are very simple operations in the proposed model (insertion or elimination nodes, subtrees); • is totally independent of any physical structure of a grid: the conversion of a grid into a tree is a unique conversion. Each grid corresponds to one and only one tree; • is based on the exchange of information between Nodes and clusters through their respective agents.

Proposed algorithms
According to the proposed model, two levels of load balancing are considered: Intra-cluster Agent based load balancing algorithm and Inter-Clusters Agent based load balancing algorithm.
There are certain specific events that change the load configuration in Grid computing and can be classified as follows: • Any new job is arrived • Accomplishment of execution of any job • Any new node is arrived • Any existing node is removed • Failure of Machine at any node • The node become overloaded When any of these events happen, the local load value is changed.

Intra-cluster agent based load balancing algorithm
Depending on its current load, each Cluster Agent decides to start a Job Migration operation. In this case, the Cluster Agent tries, in priority, to balance its load among its nodes.

Load estimation
The node load at a given time was simply described by the CPU queue length. It indicates the number of processes awaiting execution. The proposed algorithm considers CPU-U (CPU Utilization), Q length (Queue length) and Mem (memory utilization) as load information parameters to measure the load of a node.

Location policy
In the next step, the nodes must be classified according to their load. Three states were used for classification: overloaded, underloaded and balanced. First, Cluster Agent must calculate two threshold values, which are calculated as follows: • cluster Agent calculates load average of each parameter (CPU-U and Qlength) over all related nodes. THL(CPU-U) =L* Loadavg(CPU-U) where, THH is the high threshold and THL is the low threshold. H and L are constants. The next step is to divide the nodes for balanced, overloaded and underloaded nodes using the threshold values as follows: • Overloaded: the node will be added for overloaded list if queue length is high, or CPU utilization is high, or memory usage is greater than 85%, then the node is classified as overloaded node.
• Underloaded: the node will be added for underloaded list if queue length is low, or CPU utilization is low.
• Balanced: the node is not into the overloaded list or the underloaded list. The node is in a balanced load state. They are considered to be more loaded than the low state and less loaded than the high state.

Job Migration Decision
After classifying the nodes, in the next step Cluster Agent decide to transfer jobs from overloaded to underloaded nodes. It sends this decision for Migration Agent. Agents of sender and receiver node. 3: Wait for an Acknowledgment from Node agent of receiver node. 4: Send an Acknowledgment for its related Cluster Agent

Inter-cluster agent based load balancing algorithm
This algorithm applies a global load balancing among all clusters of the Grid. The Inter-cluster load balancing at this level is made if Cluster Agent fails to balance its load among its associated nodes. In this case the cluster agent transfers jobs to under loaded clusters based on the Decision taken by Grid Agent. the following algorithms are proposed: The last algorithm is implemented in Grid Agent which determines the way a receiver cluster is selected for a job migrated from overloaded cluster. Grid Agent calculates the minimum communication cost of sending jobs from saturated cluster to receiver underloaded cluster based on the information collected in the last exchange interval. Grid Agent selects the cluster that gives minimum overall cost.

Agents interactions
The proposed agent based load balancing algorithm is intended to take advantage of the agent characteristic to create a self-adaptive and self-sustaining load balancing system. It consists of five types of agents, in unbalanced situations, and if the Cluster Agent finds that there is a load imbalance between the nodes under its control, it uses the gathering event information policy to receive the load information from each Node Agent. On the basis of this information and the estimated equilibrium threshold, it analyses the current load of the cluster.
Depending on the result of this analysis, it decides whether to start a local balancing in case of an unbalanced state, or simply inform Grid Agent of its current load. Node Agent sends the updated local load value to Cluster Agent, which updates its load information. The local node load is calculated by the Node agent residing at each calculation node. Node Agent creates the task queue at the local node and updates it if necessary, and sends it for Cluster Agent based on the defined events. Migration Agent is responsible for migrating jobs to the selected underloaded node.
There is a Migration Agent in each cluster, who expects an acknowledgement of receipt from the receiving node once it receives the migrated job. The Migration Agent ensures that the work is successfully received and resumed or started at the destination node. The last agent is Grid Agent, it is the role of the distribution of work between clusters, all Cluster Agents are started by this type of agent and it decides whether to start a global load balancing in case of a saturated state.

Experimental environment
An experimental environment using Alea 2 as a grid simulator and JADE (Java Agent DEvelopment Framework) for agent implementation was set up to evaluate the effectiveness of the proposed algorithms. In the proposed infrastructure, management agents can communicate in a Grid environment using the Jade agent platform. In addition to Alea 2, a class library was developed that simulates the activities of an agent platform. This library, called ABLB (Agent based load balancing), includes the classes: Grid Agent Cluster Agent, Migration Agent and Node Agent.

Workload
The complex data set was modelled from the national Grid of the Czech Republic's MetaCentrum, which allowed to carry out very realistic simulations. It also provides information on machine failures and specific work requirements and this information influences thequality of solu tions generated by scheduling algorithms. The job description includes (job ID, user, queue, number processors used, etc.).
The cluster description also includes detailed information such as RAM size, CPU speed, CPU architecture , operating system and list of supported properties (allowed queue(s), cluster location, network interface, etc.). In addition, the information machines were under maintenance (failure/restart). Finally, the list of queues containing their time limits and priorities is provided. More details on the trace file used can be found at [9].

Performance evaluation
The important performance factors in estimating the proposed algorithm is maximizing resource utilization. the use of resources was the main focus (%). The number of clusters was assumed to be 14, and each cluster was considered to be composed of different numbers of resources. The number of jobs was 3000. Figure 3 shows the use of the cluster with and without the proposed algorithm. It can be noticed that the agent based load balancing algorithm is more effective in maximizing resource utilization. The proposed algorithm allows job to be scattered over the most available resources when there was no appropriate resource, unlike other traditional algorithms that try to select the best resource that resembles the work requirements; otherwise, the job will remain in the global queue, indicating an underutilization of those resources.

Conclusion
The algorithms proposed under the Alea 2 simulator written in Java were developed to test and estimate the performance of the load balancing model based on the proposed agents. Experimental results showed that the proposed model allows a better balance of load and the correct use of resources. There are several approaches to improve resource utilization and reduce response time through coordination and cooperation among agents.
Therefore, the proposed model supports heterogeneity, scalability and dynamics of grids. In addition, a multi-agent architecture for grid load balancing was suggested, as well as a job migration technique to reduce the difference between overloaded and underloaded nodes. Finally, to estimate node load, the combination of CPU usage, memory usage and queue length was applied.
However, the problems of the model implemented included the reliability problem; there is no certainty that migrating work will resume in the reception node. The sender node does not keep a copy of the job until it is left at its new receiver node. Other solutions must be found to offer more reliability for migrating jobs. Moreover, the time required to complete a migration process is not explicitly calculated.
Hence, this study considered the comparison of the proposed algorithm with other agent-based load balancing algorithms, the cost of negotiation between agents, the use of a moving agent for load balancing, and the improvement and use of the proposed model in real grid environments.