Statistical Analysis of Urban Traffic Flow Using Deep Learning

In recent years


Introduction
In the field of transportation, there has been notable advancements in deep learning technology in recent times, among which graph convolutional network (GCN) has attracted much attention as an effective graph data modeling method.However, the traditional GCN model encounters challenges such as low accuracy and substantial prediction errors when dealing with traffic prediction.Consequently, research focused on enhancing the prediction accuracy of the GCN model has emerged as a prominent topic within the current academic community.In order to improve the traditional GCN model and improve the accuracy of urban traffic flow statistics, this paper introduced the long short-term memory (LSTM) model into the GCN model.By integrating urban road conditions and traffic flow data, this approach achieves more precise and reliable traffic flow prediction.Furthermore, empirical evaluation using real-world traffic dataset from Cangzhou, Hebei Province demonstrates that the proposed GCN-LSTM model outperforms other models in terms of prediction accuracy and precision.

Traffic flow
The term "traffic flow" refers to the density and flow degree of moving objects within a transportation network during a specific time period.It serves as an important indicator for measuring the traffic congestion levels and road utilization.Traffic flow exhibits spatio-temporal correlation, periodicity, and uncertainty.It is possible to achieve data processing and feature engineering with appropriate model algorithms and evaluate and optimize predictive models by using historical traffic flow data as training samples in conjunction with on-site observation data, environmental factors, and road network structure information [5].These models can analyze historical data and train models using statistical methods, machine learning, artificial intelligence, and other technologies for predicting future traffic flow indicators.They assist transportation management departments in formulating effective traffic strategies for improving traffic mobility and reduce congestion issues.

GCN model
The The GCN model has been widely used in trafficrelated tasks, including flow prediction, road condition analysis, and road network modeling.By modeling the topology structure and node characteristics of the traffic network, it offer more accurate traffic prediction and analysis results for road network.Through iterative updates to node representations, the direct neighbors of the node are considered in each iteration, and their feature information is aggregated and updated.The way of neighbor aggregation and multi-layer stacking enables nodes to use their location relationships within the graph to acquire richer information.Consequently, it becomes possible to capture contextual relationships among nodes within the graph structure while effectively addressing data analysis tasks involving graph structures [7].The update formula for each node and layer of the model is as follows: where  +1 is the node representation of the l+1-th layer,  () is the node representation after the processing of layer l, and   represents the weight matrix of the l-th layer. ̃ represents the sum of the adjacent matrix and unit matrix, i.e.,  ̃=A+I. ̃ stands for the degree matrix of  ̃.
This way of information transmission employed by the GCN model facilitates the comprehensive analysis of traffic data patterns and characteristics from both a global and local perspective, taking into account the topological structure of the traffic network as well as node relationships [8].However, it falls short in effectively capturing long-term time dependencies when dealing with time series data.However, the LSTM model is very good at processing time series data and can effectively capture long-term time dependencies, enabling retention of past information for accurate future predictions.Therefore, the LSTM model is incorporated into the GCN model to effectively capture and retain historical traffic flow information, enabling consideration of previous states and trends in prediction.This comprehensive modeling of spatio-temporal relationships allows for better adaptation to dynamic changes and complexity in traffic flow data, resulting in accurate predictions of traffic flow and road conditions.Ultimately, this improves traffic efficiency and reduces congestion.

LSTM model
The LSTM is a variant of temporal RNN that solves the issues of gradient explosion and vanishing in traditional RNNs by introducing input gates, forget gates, and output gates [9].LSTM adjusts the degree of information storage and forgetting in both long-term memory and short-term memory by setting a threshold, while introducing cell state to save long-term memory information [10].This method effectively overcomes the gradient problem encountered in RNN and improves the stability of the training model.
The forget gate filters the cell information of the previous time step and determines which information needs to be discarded.The formula is: This gate integrates information ℎ −1 transmitted in the previous stage and input data   at the current moment, so as to fuse them together in the calculation process, control whether the input information at the current time step enters the memory unit, and thus affect the memory and information flow of the network when processing sequence data.

Improved GCN model
The GCN model performs well in handling nonlinear data and capturing complex dependencies between nodes, while the LSTM model is good at capturing long-term dependencies in time series data.These two models are integrated to form a new approach known as the GCN-LSTM model.This model operates by iteratively alternating between the GCN and LSTM models, continuously updating node representations and timedependent information.By calculating the gradient of the parameters to the loss function and updating the model parameters according to the gradient, the model progressively adapts to training data and improves performance [12].The steps are as follows.
(1) The GCN model is used for conducting representation learning on graph data, and the representation of each node is updated by the adjacency matrix and node features.
(2) The spatial feature representation extracted by the GCN model is fed into the LSTM model, which uses its memory unit and gating mechanism to capture the time dependence in sequence data, thereby further extracting crucial time-related information.
(3) The GCN-LSTM model undergoes training with temporal data across multiple time steps.At each step, the hidden state is updated and adjusted to make predictions regarding future traffic flow or other relevant information.

Experimental analysis 4.1 Data acquisition
In order to evaluate the GCN-LSTM traffic flow prediction model constructed in this paper, experimental verification was conducted.Data on vehicle flow from Qinxue Road, Wenlan Street, and Fengfan Road in the central urban district of Cangzhou City were collected from the website of the Ministry of Communications of the People's Republic of China during October 1 to October 31, 2021.The statistical time period ranged from 00:00 to 24:00, with a time interval of every 10 min.Finally, 1,500 records were obtained as the data set.The actual road images were collected from several angles and time periods, and the resulting images were subjected to frame-splitting and data cleaning, and manually labeled with automobiles, pedestrians, and non-motorized vehicles.The GCU and GCN models were selected as control objects, and the mean absolute percentage error (MAPE) was used to evaluate the prediction performance of the test models.

Evaluation indicators
Due to the complexity of the traffic system, the MAPE can be used as the evaluation indicator for assessing the performance of the GCN-LSTM prediction model.MAPE represents the average degree of deviation between predicted and actual vehicle flow, with a smaller value indicating a stronger correlation [13].The formula is: where n is the number of observations,  ̂ is the predicted value of the model, and   is the true data value.

Experimental results
By comparing the MAPE of the GRU, GCN and GCN-LSTM models as illustrated in Figure 1, it is evident that the improved GCN-LSTM model exhibited exceptional prediction performance for Cangzhou's traffic flow dataset with a significantly low MAPE value.As shown in Figure 2, the vehicle flow in the central urban area showed obvious periodic changes.With the exception of holidays, this cycle occurred consistently throughout all seven days of the week, and each cycle displayed similar characteristics regarding vehicle flow.Notably, weekdays witnessed a substantial volume of vehicles, while non-weekdays experienced relatively lower traffic levels.2).
The data in Table 2 reveals that during the period from October 18 to 22, there was a low traffic flow in the early morning hours, which remained relatively stable at around 2,000 PUC/D.However, it experienced a significant surge after 06:00 and reached its peak between 06:00 and 08:00.Subsequently, there was a decrease in traffic flow from 10:00 to 16:00 followed by an increase reaching its maximum at 18:00.On October 23 and 24, compared to regular working days, the peak traffic volume was delayed by two hours.Consequently, it can be concluded that there were two distinct peaks of traffic flow in the central area.The periods of 06:00-08:00 and 16:00-18:00 exhibited the highest traffic flow on weekdays, while the peak traffic flow on non-weekdays occurred between 08:00-10:00 and 18:00-20:00.Generally speaking, weekdays witnessed a relatively substantial volume of vehicle flow.Because there is a time-dependent relationship between vehicle flow, vehicle flow on weekdays and nonweekdays by extracting time features [14].Therefore, when the GCN-LSTM model makes a prediction, it is necessary to determine the input and output of the model beforehand [15].The input data is mainly composed of the following features: historical traffic flow, month, specific date, travel time period, working day indicator, and legal holidays (0 means yes, 1 means no).The known traffic data for historical dates was selected as the input, and the date to be predicted was input.For example, the traffic flow related characteristics of Qinxue Road from January 1, 2023 to July 31, 2023 was selected as input to predict the vehicle flow between 6:00 and 20:00 on this road section on August 1, 2023.Subsequently, the predicted vehicle flow value was compared with the actual value (Figure 3).The predicted vehicle flow of Qinxue Road in Cangzhou City during the morning peak hours of 06:00-08:00 on August 1, as depicted in Figure 3, exhibited a significant deviation from the actual value.It is worth noting that this time period typically experiences heavy traffic volume with an average of approximately 9,600 PCU/D based on historical data.However, it can be seen from Figure 3 that the actual traffic flow on August 1 was 7,539 PCU/D, which exhibited a decrease compared to the previous flow, and the morning peak time was also postponed.Consequently, it can be inferred that congestion occurred on Qinxue Road between 06:00 and 08:00 on August 1st.

Discussion
Urban transportation is easily affected by various factors such as human intervention and traffic control.To cope with congestion caused by peak travel and traffic restrictions, it is essential to have real-time monitoring of traffic flow and predictions for future time periods.Traditional GCN models have several limitations in the field of transportation.For instance, these models can only extract spatial features in a transportation network and may not effectively consider the temporal changes in feature processing for spatiotemporal data.Traditional GCN models tend to provide deterministic predictions, which limits their robustness in practical applications due to the existence of uncertainties in real traffic networks.However, the experimental results showed that by leveraging the temporal characteristics of LSTM, it effectively modeled the time dimension data and reduced the average absolute percentage error between predicted and actual traffic flow.This approach better captures the temporal variations in traffic volume, comprehensively considers both spatial and temporal features, enhances understanding of dynamic patterns in traffic data, and ultimately improves the accuracy of traffic flow prediction.In actual urban traffic flow prediction, by comparing with historical data, we can more accurately assess the future changes in city road conditions.By comparing historical data, future changes in city road conditions can be more accurately assessed in actual urban traffic flow prediction.Therefore, the GCN-LSTM model was used to predict the traffic flow in real time, and the result was compared with the historical data.The change of road conditions was judged by analyzing the difference between the predicted results and the actual observation values.If the difference between the predicted value and the actual value is large, it may mean that there is congestion or other abnormal conditions.Based on these prediction results and difference analysis, traffic management departments can take timely measures to optimize road traffic safety and efficiency and maximize the utilization of road resources.According to the predicted traffic conditions, people can adjust the appropriate means of transportation and formulate more reasonable travel routes at any time, so as to avoid congestion and improve travel efficiency.

Conclusion
In this paper, the LSTM model was introduced to improve the GCN model.An experimental application was carried out using real traffic data of Cangzhou, Hebei Province.By training and verifying the GCN-LSTM model, the traffic flow in different time periods was successfully predicted, and the statistical analysis was carried out.The results showed that the improved GCN-LSTM model significantly improved the prediction accuracy and precision, enabling more accurate forecasting of traffic flow fluctuations and providing valuable support for travel route and mode planning.Furthermore, it can also assist traffic management departments in conducting traffic dispatch and monitoring more effectively, thereby reducing congestion and improving road efficiency.Additionally, it provides valuable reference data for urban transportation planning and management.In the future, we will further optimize and improve the GCN-LSTM model to verify its applicability in a wider range of scenarios, aiming to continuously enhance the efficiency and safety of urban traffic.

Figure 1 :
Figure 1: Performance comparison results of various models.

Figure 2 :
Figure 2: The change trend of traffic flow in Cangzhou City in October.After analyzing the monthly change characteristics of vehicle flow in the central urban area, it can be found that the change of the vehicle flow in the central urban area was periodic from Monday to Sunday.The vehicle flow data of the central urban area of Cangzhou City from October 18 to 24, 2021, which represents a complete week excluding holidays, were selected to analyze the variation distribution characteristics of vehicle flow in the central urban area within a week (from 00:00 to 24:00) (Table2).The data in Table2reveals that during the period from October 18 to 22, there was a low traffic flow in the early morning hours, which remained relatively stable at around 2,000 PUC/D.However, it experienced a significant surge after 06:00 and reached its peak between 06:00 and 08:00.Subsequently, there was a decrease in traffic flow from 10:00 to 16:00 followed by an increase reaching its maximum at 18:00.On October 23 and 24, compared to regular working days, the peak traffic volume was delayed by two hours.Consequently, it can be concluded that there

Table 2 :
The seven-day change distribution of the vehicle flow in the central urban area of Cangzhou

Figure 3 :
Figure 3: Predicted and actual traffic flow values of Qinxue Road in Cangzhou on August 1.

Table 1 :
A summary of related works [11]input gate controls what information from the current input and the hidden state at the previous time step should be added to the cell state, and the corresponding formulas are:  = (  .[ℎ−1 ,   ] +    ̅  = ℎℎ(  .[ℎ−1,]+   ),   =   *  −1 +   *  ̅  .By filtering the forgotten information, the input gate combines the input data of the current time step with the hidden state of the previous time step at time step t to obtain candidate cell state   .The output of the input gate is then multiplied by the candidate cell state, and the obtained result is added to the previous memory state to update the memory state at the current time step.The output gate controls the flow of information between the input at the current time step and the memory at the previous time step, and between the input at the current time step and the output at the current time step.The corresponding formulas are:  = (  .[ℎ−1 ,   +   ), ℎ  =   * tanh (  ), where  and tanh are both activation functions,   represents the input data at time t, ℎ −1 is the hidden state at time step t-1,   is the cell state at time t, and  ̅  is the cell state value at time step t-1.The forget gate, input gate, output gate, and neuron state all have parameter matrices with values between 0 and 1, which are denoted by   ,   ,   , and   , respectively.bf, b i , b o , and b e are bias vectors corresponding to matrices.ft, i t , and O t represent threshold values corresponding to the three gates[11].