A Deep Reinforcement Learning Model-Based Optimization Method for Graphic Design

The significance of Deep Reinforcement learning is sensibly represented in the method of optimizing the graphic design and space framework of buildings in context with the worldwide big data environment, wherein people have increasingly stringent requirements for building layout and design and conventional layout is increasingly inadequate. This research put out a novel approach to topology optimization using deep learning in geometry. Deep neural networks characterize the density distribution in the design domain. By employing a geometry-based deep learning approach to represent the density distribution function, we can successfully avoid the checkerboard phenomena and ensure a smooth border. With a deep learning reinforcement approach, the design variables may be drastically decreased. In adjusting the designs of neural networks, we may fine-tune not only the minimal length but also the structural complexity. The proposed model has provided an accuracy of 95% and a computation time of 61s. The effectiveness of the suggested technique is shown by several 2-dimensional and 3-dimensional numerical results ranging from minimal conformance to stress-constrained issues.


Introduction
In both academia and business, research on machine learning (ML) and artificial intelligence (AI) has grown significantly in the past ten years.As computer technology improved and the need to evaluate increasing amounts of data evolved, these methods, which were previously undervalued, found updated recognition.Reinforcement Learning (RL) aims for maximizing a numerical reward signal by retraining the system to relate actions to instances.The student must attempt each activity to determine which is most rewarding rather than being instructed which to choose.The issue of how agents should learn a strategy that acts in a way to maximize the cumulative reward through interaction with the environment is addressed by reinforcement learning (Tapeh & Naser, 2022).Figure 1 represents Deep Reinforcement Learning Implementation using the Interior Design Model.The article outlines the solution of multi-objective reinforcement learning (MORL) tasks with unknown weights and many conflicting objectives (Yamaguchi, Nagahama, Ichikawa, & Takadama, 2019).The research demonstration continues to grow because it enables robots to quickly acquire innovative abilities.
In inverse reinforcement learning (IRL), demonstrations can benefit in a number of methods by having the robot make an effort to determine the objectives or reward from the human demonstrator (Das, Bechtle, Davchev, Jayaraman, Rai, & Meier, 2021).The creations of completely autonomous agents that interact with their surroundings for learn the best behaviours and perfect them over time through trial and error.Making AI systems that are responsive and can successfully learn has long been a problem, from software-only agents that can interact with spoken language and multimedia to robots that can perceive and respond to their environment (Zhou, Lee, Diao, Shi, Balyen, &Peto, et al, 2019).RL is a mathematical framework with guiding principles for experience-driven autonomous learning.While earlier iterations of RL had some success, they were fundamentally confined to rather low-dimensional issues and lacked scalability (Cioffi, Travaglioni, Piscitelli, Petrillo,& De Felice, et al, 2020).AI will have a profound influence on human existence in the future due to the worldwide nature of the world, and it will be a key factor in designers' decision-making processes.
Artificial intelligence is fundamentally a tool, and it should exercise its four main responsibilities of anticipation, contemplation, negotiation, and reaction throughout the process of design innovation (Bichu, Hansa, Bichu, Premjani, Flores-Mir, &Vaid, et al, 2021).Each designer has a preference, and ResNet artificial intelligence is suggested as a way to increase decision accuracy while also increasing the effectiveness of design selections based on individual designer preferences.To successfully prevent the negative consequences of designers' decision-making preferences, pattern recognition, and decision-making difficulties are combined (Wang, Tang, Huang, Chen, Zhang, & Huang, (2020)).The term "spatial layout design" describes the process of partitioning a given space into several tiny spaces or of logically placing certain things in the area within the framework of some objective and arbitrary design standards and layout conventions (Bouhamed, Ghazzai, Besbes, &Massoud, (2020) The PRISMA-ScR standards were followed in the scoping assessment of the research.
To enable the UAV to navigate over obstacles and the continuous area developed the Deep Deterministic Policy Gradient (DDPG).
The UAV is provided utilizing the DDPG in constant movement space to navigate over obstacles to achieve its designated destination.
The limited dimensions of mobility and action space for UAVs, which could lower their effectiveness in dealing with everyday environments.
A policy-based RL model was developed in the investigation to depict the behaviour of controlling the thermostat and material level.To simulate the individuals' behaviour, a MDP used.
The behaviour of building occupants could be predicted reasonably well using the RL framework and transfer learning.
A Contribution of the study Thus, this research contributes by demonstrating an implementation of the topology optimization to increase its effectiveness by Deep Reinforcement Learning and the field's relevance to making decisions through trial.The following are some of the particular accomplishments of this paper: • The approach of interior design based on certain learning method is evaluated.• To encourage the mathematical method of topology which is an optimized material layout within a given design space and assess the effectiveness of the process, an efficient Deep Reinforcement Learning component is suggested.

Application of deep learning in graphic design
The

Materials and method
Graphic design has been around since the beginning of time.Books, periodicals, packaging, newspapers, banners, emblems, and many more things all benefit from graphic design in some way.Graphic design, topology optimization, our suggested deep reinforcement learning approach, and performance assessment of this graphic design are the primary topics covered in this chapter.

Graphic design
According to a widely held belief, visual design is the art and skill of giving various words and graphics an orderly, practical, and appealing framework.Both the act (verb) and the product (noun) of visual art are related concepts.A kind of "all design" employed in the creation of different platforms is traditional graphic design.The logical and practical aesthetics that developed in conventional graphic design over the years for media are the foundation for contemporary visual graphic design, which is today employed across multiple fields such as industrial layout, information architecture, message styling, and more.Table 2 displays the types of graphic designs.

Topology optimization
Topology Optimisation as a construction tool is rarely implemented in the design of buildings.It is usually the result of a laborious procedure necessary to produce results that meet the standards of a designer.Yet, that difficulty shouldn't prevent some builders from trying out these instruments in building design.The density-based approach converts the substance distribution into a finite-element spatial configuration.By constructing discrete elements of varying densities, the finite element method is developed.Mesh is used to represent density spatially in the wellestablished SIMP method, yielding an optimized layout with spaced boundary conditions.So, it takes a lot of work in post-processing to make a smooth CAD model, and that might reduce the accuracy of the geometry near the border.
As the mesh is employed to describe the organizational topology, the variety of design parameters is usually quite huge for 3D design, and many mature optimization strategies are not appropriate for large-scale problems.In this section, we describe a novel approach to density portrayal that resolves those particular issues by using a feed-forward neural network.A high-fidelity feed-forward neural network can be used to illustrate a complex shape, ensuring a smooth surface throughout.Thus, a deep feedforward network is a natural choice for representing the density field in the design domain.In Figure 3, we see a contrast of three feedforward neural networks, each having three hidden units and a unique set of neurons in each of those levels and Figure 4 displays the outcomes of the training.
The density field may be expressed mathematically as: ∅(, , ℎ) = ℳ(ℕ(, , ℎ, ))(3) Where ℕ represents feedforward networks and stands for a free-form parameter.Several discrete layers make up a deep-layered network's topology.Networks with hidden layers may be represented as, where  (1) represents the output of the corresponding hidden layer.

Minimum compliance
Topology optimization using a compliance-minimizing formulation is developed with deep reinforcement learning (DRL).In the space of design, a DNN represents the density field.Hence, the TO will repeatedly update the network configuration in the design domain to improve the concentration field until the component arrangement provides optimal stiffness performance.During optimization, the density field in the design domain is changed by adjusting the connection weights in a feedforward fashion.This allows us to formulate the optimization issue as: Whereθthe feedforward is network parameters and  is the architectural compliance goal function.The relative densityΦ in the world of design is denoted by, where C  is the proportion of the volume that must conform to the design.The finite element framework uses the unknown velocity field( ), the pressure (), and the elastic matrix () to represent these

The lower limit of stress compliance
While optimizing for the least conformance with pressure limitation issue, mises pressure is always employed to gauge local stress and serve as a restriction on the search space.Yet, it is numerically costly to restrict local stress.To estimate the local stress limitation, a p-norm method is used here.Many updated strategies for precise local stress regulation have been put forward in recent years.To keep things simple, we use a tried-and-true technique to put a cap on the local stress created by von Mises.In this approach, the constraint is formulated using the p-norm measure PN.Thus, the issue presented in Section 3.2 may be restated as:

Sensitivity testing for layouts
The objective's responsiveness to the model parameters, i.e., the strengths of the feed-forward network, is required for gradient-based optimization.The chain rule will be used to calculate the objective stored procedure sensitivity.You may calculate the density field sensitivity using the adjoint approach.
for ∅, where ()is an expression of the density field.The algorithmic differentiation method used in the free program CasADi makes it simple to get the sensitivity of () about the network weights w.In a similar vein, the following derivation using the chain rule may be used to do a risk assessment of the p-norm stress: Where, one may find the adjoint technique of   ∅ quantitative susceptibility deduction.

Deep reinforcement learning
The MDP, the central formalism in RL, has been presented, and some of the difficulties in the field have been touched on.The following discussion will categorize RL technologies into their respective groups.Both valuefunction-based and policy-search-based techniques may be used to address RL issues.The actor-critic method combines critical values and strategy search into a single strategy.We would then describe these methods, along with some other tools, for addressing RL issues.

Function of value
Both the optimum policy * and the ideal state-value function  * ()may be expressed in terms of one another.
Knowledge of   ()the best policy might be retrieved by determining the course of action that maximizes the function's value at state   among the potential outcomes The transitional dynamics T are not accessible in the RL setup.As a result, we create a different function referred to as the state-action value or quality value   (, ) which is similar to   , with the exception that  is given as the first action and is only applied after the subsequent state: By selecting an aggressive at each stage (, ), one may determine the optimum policy given   (, )arg   (, ).According to this rule, we can also determine   ()by maximizing   (, ):   (, ) =     (, ).

Dynamic programming
To learn   , we make use of the Markov property and formulate the variable as a Bellman equation, that has the recursive form: In other words, we may utilize the present values of our approximation of   to improve it.This suggests that   can be improved through bootstrapping.This is the cornerstone of the SARSA algorithm and Q-learning.
(  ,   ) ←   (  ,   ) + , Where is the learning rate and  =  −   (  , ℎ  )is of the temporal difference error; Y is the goal in this case, much as in a typical regression issue.By employing transitions produced by the behavioral policy (the policy derived from ), SARSA, an on-policy training algorithm, is utilized to enhance the approximation of  , which has the effect of establishing =   +   ( +1 ,  +1 ).Q-learning is against policy since   is modified by transitioning that is not always produced by the derived policy.As an alternative, Q-learning employs  =   +  =     ( +1 ,  +1 ), which closely resembles * .
We employ generalized policy repetition, which comprises policy evaluation and enhancement, to determine  * from an arbitrary  .Minimizing TD inaccuracies from the trajectory encountered while following the policy is one way in which policy assessment helps to enhance the estimation of the functional form.By making greedy decisions based on the revised functional form, the policy can be made more effective as estimation accuracy rises.Generalized policy iteration allows these steps to be interleaved, rather than performed sequentially to obtain an optimal (as in policy iteration), speeding up the process.

Sampling
Instead of utilizing optimization techniques to bootstrapping value functions, Monte Carlo approaches use the average return from numerous policy rollouts to predict the anticipated return from a state.This means that contrary to popular belief, pure Carlo techniques are applicable in non-Markovian settings.Nevertheless, they are limited to serial MDPs, since the rollout must end before the return can be determined.To get the most out of both approaches, the (⋋)algorithm combines TD learning with Monte Carlo policy assessment.The (⋋) functions as an interpolation between Carlo computation and ramping in same way that the present value does.
Learning the benefit function   (, ) is a key component of another effective value approach.Provides relative values and experimental as opposed to creating utter impossibility values as  , does.Understanding relative values is similar to lowering the threshold or median level of a signal; intuitively, it is simpler to understand that one course of action will have better results than another than to understand the exact return from that course of action.Via the straightforward equation,  =  −   reflects a relative benefit of actions.It is also closely connected to the baseline variability reduction approach used in diffusion policy search methods.Several modern DRL algorithms have used the concept of advantage updates.

Policy search
The search for the best policy can be done independently of any model of the value function.To maximize the anticipated return [|]most people choose a parameterized strategy whose parameters may be optimized in either a horizontal stripe or horizontal stripe fashion.Both gradient-free and gradient-based techniques have been used effectively to train neural network models that encode policies.While diffusion optimization has shown promise for covering cheap parameter spaces, most DRL techniques still favor diffusion training since it is more specimens when dealing with policies that have many characteristics.

Policy gradients
An efficient learning indication of how to fine-tune a parameterized policy may be obtained from gradients.But to calculate the anticipated return, we need to take an average across conceivable paths that the present policy parameterization may provide.This takes average calls for either predetermined (via linearization, for example) or simulated annealing (via sampling) approximations.Only in a prototype system, where the fundamental changeover mechanisms can be modeled, can predictable approaches be used.For the most part, model-free RL settings use a Carlo calculation to determine the anticipated return.This Carlo estimation presents a problem for diffusion learning because gradients do not propagate through random specimens of a probability function.As a result, we use a scoring function or posterior probability estimator (known as the REINFORCE rule in RL) as an estimate of the gradient.The latter name is evocative, as maximizing the log-likelihood is a common method for supervised learning that is used in conjunction with the estimator.The log-likelihood of the sampled action is increased by the estimator's gradient ascent, which is graded by the return.Calculating the gradient of an expectancy over a linear function of a random vector about parameters may be formalized using the REINFORCE rule.
Because this calculation is based on the actual results of trajectories, the resultant gradients are very inconsistent.A more manageable variance may be achieved by including unbiased estimates with lower levels of background noise.
The standard approach involves deducting a baseline, which implies putting more emphasis on positive updates than purely financial ones.The most elementary foundation is the average annual return across several events, although there are numerous more possibilities.

Actor-critic methods
When value features are combined with explicit consideration of the policy, we get actor-critic approaches.
The "critic" (value function) provides the "actor" (policy) with constructive criticism that helps it improve.They achieve this by balancing the benefits of reducing the variation of policy grades with the drawbacks of introducing bias when using value function approaches.
Policy gradients in actor-critic approaches are derived from the value function, just as they are in others' development; the key distinction is that actor-critic approaches employ a learned value function.As a result, we will go over actorcritic techniques as a special case of gradient descent methods later on.

Results and discussion
This section examines the existing methods like MDP (Ran & Dong, 2022), VR (Wu, 2022), and AI (Di & Yu, 2021) with time consumption, accuracy prediction, precision value, and the recall factor by comparing with our recommended strategy.Python 3.7 is used to implement the models for accurate selections.TensorFlow 2.0.0 is used to implement the value neural network.For simulations, we employed a GNU/Linux server equipped with a 64-bit Intel Xeon Gold CPU executing at 2.10GHz.

Computation time
A computer operation's "computation time," often known as its "running time," is the amount of time needed to finish it.The quantity of rule implementations will have an impact on how long it takes to finish a computation, which may be seen as a collection of rule applications.With a logic-gate-based quantum computer, the number of unitary transformations is directly proportional to the time required to complete a single "quantum parallel" calculation.3 shows the computation time for proposed method.The computation time requires the DRL framework to analyze and produce optimal design configurations in an optimization technique for graphic design.For actual time applicability and easy incorporation into a graphic design process, efficient calculation time is essential for timely and flexible design optimization.Standard methods that include VR and MDP take 91% and 73% of the time.AI has an 81%-time utilization rate.The method that has been proposed requires only 61% of the computing time, which is a significant reduction.The accuracy of the suggested technique is seen in Figure 6.It is possible to think of a device's accuracy as how closely its estimations of a quantity match the value that matches that number.

Accuracy
Accuracy measures how well the model produces designs that meet predetermined standards, guaranteeing the efficiency of the optimization procedure.The capability of model to apply DRL methods to produce attractive and functionally successful graphic designs is demonstrated by the high metric accuracy obtained.Conventional methods, such as VR and MDP, yield 65% and 75% accuracy.Accuracy is increased to 85% when AI is used.The proposal provides the most effective 95% accuracy rate, demonstrating its effectiveness in improved graphic design processes.Table 4 displays the accuracy of the suggested strategy.

Precision
Precision or positive predictive value is the percentage of pertinent concepts among recovered occurrences.It can imply that the standard for quality is accuracy.Precision is the extent to which the same results are achieved from the same measurements carried out under the same conditions.
Reproducibility is the variance that happens when the same technique is applied over extended times by different instruments and operators.
When every attempt is made to maintain a process, repeatability is the variance that occurs when the same equipment and operator are used and the same short amount of time is given to each repetition.The precision for the suggested system is shown in Figure 7.
The precision is essential for assuring that the algorithm navigates the design space efficiently and generates visually appealing graphics.It displays the model's ability to optimize parameters for design to satisfy predetermined standards and make delicate adjustments, which increases efficiency in graphic design activities.Using a 98% precision rate, the proposed method showed outcomes.Compared with various methods, it performed better at 88%, 75%, and 66% in VR.The research objectives outcomes illustrate determining whether the DRL method succeeds in relation to obtaining higher precision.In Table 5, the suggested approach is shown.

Recall
The ability of the model to identify every significant sample in a set of data is referred to as recall.According to statistics, it is defined as the percentage of the TPs multiplied by the sum of TPs and FNs.Utilizing the formula, the recall is calculated.Comparative data for the recall metrics are shown in Figure 8.The Recall is an important component that ensures the models maintain important data and apply it to the design process, improving the efficacy and efficiency of the optimization process to produce elegant designs.With a recall of 77% VR, MDP obtains a recall rate of 66%.AI produces an 87% recall rate.The proposed exceeds other methods with a 98% recall rate, demonstrating its effectiveness in the specific research environment.Table 6 depicts the comparison of recall

Conclusion
To aid in the process of navigating graphic design files, we proposed DRL framework.The most advanced DRL techniques are often used in artificial settings where the distribution of pictures does not correspond to that of natural scenes.This is an important step in achieving more lifelike environments.Because of the rapid proliferation of generative design tools, it is now possible to augment traditional shape-finding procedures with technological answers.Our findings highlight the potential for using topological optimization techniques in the built environment.Some key takeaways are as follows: (a) As contrasted with the conventional voxel-based optimization technique, when a neural network is used to model the density fields, the amount of architectural parameters is significantly decreased.
(b) As the topology is represented implicitly, the resulting layout does not have a staggered border.
In the long run, this paper's approach offers a fresh chance to combine deep learning with topology optimization.More advanced and robust deep-learning models have been presented in recent years.This paper's proposed approach is a hybrid of deep learning and topology optimization.More deep learning models, like CNN and GAN, will be used to represent the density field in upcoming research.

Figure 1 :
Figure 1: Deep reinforcement learning implementation using the interior design model.

Figure 2 :
Figure 2: Graphic design of building in DRL

Figure 3 :Figure 4 :
Figure 3: Feed-forward neural network design structure is the constructed stiffness matrix and is the conjugate gradient vector obtained from the conjugate gradient equation  = − .Using the chain rule, we can write down how sensitive objective is is to changes in design variable w.

Figure 5 :Figure 5
Figure 5: The computation time of the proposed and existing system Figure 5 and Table3shows the computation time for proposed method.The computation time requires the DRL framework to analyze and produce optimal design configurations in an optimization technique for graphic design.For actual time applicability and easy incorporation into a graphic design process, efficient calculation time is essential for timely and flexible design optimization.Standard methods that include VR and MDP take 91% and 73% of the time.AI has an 81%-time utilization rate.The method that has been proposed requires only 61% of the computing time, which is a significant reduction.

Figure 7 :
Figure 7: The precision of the proposed and existing method

Figure 8 :
Figure 8: Recall of proposed and existing method

Table 1 :
Survey of related works (Luong, & Pham, 2021) Building Information Model system and Python development tools, enabling cross-platform collaboration deep learning on computers and further design effort, The architectural design methodology of the BIM system and the interior design research carried out using the BIM building data platform were assessed in the article is shown using real-world examples(Luong, & Pham, 2021).The study paper's goal analyze the demand for interior space design has risen quickly along with the rate at which people are purchasing homes.In the domain of autonomous interior space design, computer science, and technology have infinite potential.The corresponding study suggested an automated way of designing spatial areas using convolutional neural networks (CNN) (Wu& Feng, 2022).
(Di, & Yu, 2021)Azad, 2019)g, et al, (2022)of sampling while determining the Q-return function, ensuring that the built-in techniques are more likely to acquire highvalue lessons while being more resilient(Li, Zhu, Zhou, Feng, & Feng, et al, (2022)).The article investigated the CNN technique as a quick and effective approach.Iteratively finishing the automated arrangement of the internal spaces begins with the predicted living room.The paper examined several empirical interior design case studies, showing that this approach had similar results to professional designers' interior design floor plans (Predić, Manić, Saračević, Karabašević, &Stanujkić, 2022).Research classified the four different Machine Learning (ML) models created for the semi-arid region of Iraq's river flow forecasting.Investigated was the efficacy of data division's impact on the development of ML models.Three data division modeling scenarios-70%-30%, 80%-20%, and 90%-10%-were examined.To evaluate how well the models are performing, several statistical indicators are computed (Tao, Al-Sulttani, Salih Ameen, Ali, Al-Ansari, Salih, & Mostafa, 2020).Using 90%-10% data division, the article demonstrated the benefits of the hybrid support vector correlation model with a genetic algorithm over current machine learning forecasting models for monthly river flow predictions.Also, it was discovered to increase the accuracy of high-flow event predictions (Zhong, Zhang, Zhang, & Zhang, 2022).The study case developed the Support vector regression (SVR) model's internal parameters may be tuned by the optimizer, which results in a robust learning process.Compared to earlier developed hybrid models, the article has improved its ability to predict stochastic river flow behavior (Xu, Zhang, Liu, Nie, Su, Nie, & Zhang, 2019.)TheResearchcompared the design of Adaptive Cruise Control (ACC) using Model Predictive Control (MPC) and Deep Reinforcement Learning (DRL) in car-following instances(Lin, McPhee, & Azad, 2019).The research explored the DRL approach as comparable to MPC with a large enough prediction horizon when modeling errors disappear and the training information range is occupied by the testing inputs (Zhu, Wang, Pu, Hu, Wang, &Ke, 2019).The study evaluated that DRL control performance declines when testing inputs are outside of the training data range, which is a sign that machine learning generalization is insufficient (Chen, Tong, Zheng, Samuelson, &Norford, 2020).The study focused on constraint optimization and multi-objective optimization; the investigation provides an innovative perspective on the data age's design progress.After verifying the quality of the non-adaptive solution set, optimizing the converge, uniformity, and extensiveness, analyzing the experimental process, and drawing a multi-objective conclusion, it is determined that additional optimization related to the interior and spatial structure is necessary for artificial intelligence making decisions in the instance of the Library of Highly Cold Lands(Ran, & Dong,2022).Research provided layout boundary or layout space to automatically generate a layout plan.The scene redirection solution has successfully been tested, according to the findings.The study used a redirection algorithm's efficacy which is shown by comparison with the outcomes of uniform scaling(Wu, 2022).The study case simulated two reinforcement learning agents in a cooperative learning setting to discover the ideal 3D layout for the Markov decision process (MDP) formulation.The article examines the tests on a big dataset of actual interior layouts, which includes industrial designs created by qualified designers.The numerical findings suggested model produces layouts of superior quality when compared to the most recent model(Di, & Yu, 2021).

Table 2 :
Types of graphic designs Graphics has been known by many different names over the last two centuries, including artistic works, advertising material, digital marketing, graphics, and visuals.This demonstrates how the range of methods used to convey information has broadened beyond traditional visual arts.The 2D graphic arts include book arts, calligraphy, lithography, cinematography, printing, and typography.Applications, experience-based design, interaction methods, user-centered design, and websites are just some of the newer areas that graphic arts have expanded to include.The number of design-related discussions is growing at an astounding rate.There is training and schooling in graphic design all around the globe, at all levels.The figure depicts the graphic model of the building structure in DRL.
Value-function-based approaches, attempt to calculate the monetary benefit (or another measure of value) of being in a certain condition.The predicted return from beginning in state s and continuing to follow is denoted by the state-value function   ().

Table 3 :
Comparison of computation time

Table 4 .
Comparison of accuracy

Table 5 :
Comparison of precision

Table 6 :
Comparison of recallInterpretability and clarification issues with DL (Zhou, Lee, Diao, Shi, Balyen, &Peto, et al, (2019)) models can prevent them from being used in domains where it is essential for explaining the decision-making process.Its application in areas with dense datasets is limited as it frequently requires substantial volumes of data with labels for efficient training, Specific knowledge can fail to identify complex patterns in data, which is the foundation of ML (Cioffi, Travaglioni, Piscitelli, Petrillo,& De Felice, et al, (2020)) methods.Complex and non-linear interactions can be difficult for the models to manage, which could result in inadequate performance on assignments where techniques for deep learning work efficiently.RL (Wang, Tang, Huang, Chen, Zhang, & Huang, (2020)) has the potential to be technically expensive and lengthy to train.Limitations include exploration-exploitation compromises, scarce reward scenarios that can cause RL models to fail and the Performance of DDPG (Bouhamed, Ghazzai, Besbes, &Massoud, (2020)) can be hindered by sensitivity to variables and training issues with stability.It could struggle with the issue of highly dimensional action spaces.When applying DDPG to intricate optimization jobs, it must be carefully adjusted and its limits need to be considered perspective in various instances.Deep Reinforcement Learning (DRL) enables the model to learn specific correlations between design elements.It provides numerous benefits in graphic design optimization.Its capacity for iterative adaptation and optimization improves the effectiveness of the method of graphic design by providing relevant information and automating complex design selections for increased innovation and efficiency.