Decision Tree Algorithm Based University Graduate Employment Trend Prediction

The employment situation of college graduates is becoming more and more serious. It is of great significance to find effective methods to predict the employment trend of students. In this study, C4.5 algorithm was used to predict the employment trend of students. Taking the 2016 graduates of Henan University of Animal Husbandry and Economy as examples, four attributes affecting employment units were extracted, the information gain rate was calculated, the decision tree was constructed, and the classification rules were obtained. After data collection, conversion and cleaning, 420 employment records were obtained; 320 records were taken as the training samples. The classification rules were tested using 100 experimental samples, and the accuracy rate was 81%. Finally, the employment trend of the 2018 graduates was predicted by C4.5 algorithm, which provides a theoretical guidance for the arrangement of employment work in schools. Predicting the employment trend of students with decision tree algorithm is feasible and of great significance to the employment guidance of schools and the employment choice of students.


Introduction
The employment problem has gained more and more widespread social concern [1]. In recent years, the number of university graduates is increasing every year, and the employment situation is becoming more and more serious. It is very important for schools to analyze and study the information about students' employment, which can help them train students according to the market demand [2]. However, with the increase of the number of students, the data of employment information of graduates are accumulating continuously [3], which brings great difficulties to employment analysis. With the progress of science and technology, many new technologies have been applied in employment analysis. Wang [4] combined the residual modified GM (1,1) model with the improved neural network to predict the employment information index of graduate students, so as to predict the employment trend of graduate students. He found that the mean square error decreased from 10-1 to 10-5 with the progress of training and the performance of the algorithm was the best when the gradient value and learning rate were 7.5912×10-5 and 0.8421. Liu et al. [5] proposed a method of information gain with weight based decision tree. The weighed based information gain was obtained by genetic algorithm. The decision tree was constructed and tested on undergraduates. They found that the method had a favourable prediction accuracy. Kwak et al. [6] found that education and gender were the most important factors affecting the employment of young and middle-aged people, while gender, health status and education were the most important factors affecting the elderly. Tan et al. [7] made the short-term employment forecast of Shandong province through independent component analysis (ICA) and found that the quality of labor force, industrial structure and income were the most important factors affecting employment. Decision tree algorithm shows a good performance in data prediction. Daga et al. [8] predicted high-risk renal transplantation using decision tree and random forest and found that the accuracy rate reached 85%, which provides accurate decision support for doctors. Mohreji et al. [9] combined with the decision tree algorithm to predict the delay of air transportation. Through the study of three New York airports, it was found that the confidence level of the prediction results was very high in at least 70% of the time. At present, most of the researches on employment trend focus on the influencing factors of employment, while few researches focus on the accurate prediction of employment trend, and the traditional data processing methods are difficult to extract useful information from the historical employment data. Therefore, in this study, C4.5 algorithm from F. Yang decision tree algorithm was used to extract four decision attributes from the employment-related information of the 2016 graduates of Henan University of Animal Husbandry and Economy, and classification rules were extracted for employment trend prediction, so as to study the reliability of this method in employment prediction.

Employment trend forecast and data mining
Under the influence of population growth, popularization of higher education and expansion of enrollment, the number of college students has increased explosively, the number of graduates has risen sharply every year, and the employment situation has become more and more serious [10], which has aroused widespread concern of the society. The employment situation of college students is closely related to the future construction of schools, students' personal development and social stability [11]. The growth of graduates' employment rate can lead to economic growth [12]. In order to effectively manage the employment situation of students, colleges and universities have adopted information technology to collect and manage the employment information of students, in order to obtain valuable information, analyze the factors affecting employment, and help improve the employment situation. However, with the growth of the number of students, the data in the information management system is also growing rapidly. Traditional information analysis methods can not deal with such a large amount of information, nor can they fully play the potential value of these data. Although there are enough data, it is impossible to obtain the implicit association and rules between the data [13] and predict the future employment development based on these information. Therefore, an intelligent and reliable method is urgently needed to solve this problem. Data mining technology can process massive information quickly and efficiently and extract valuable information from it. It has a wide range of applications in fields such as business, industry and military. Mining and analyzing the employment information of graduates through data mining technology to obtain the factors affecting employment can help the school employment guidance center to guide the employment of students and promote employment. It can also predict the employment trend of graduates based on the information, so as to provide a decision-making basis for the adjustment of school teaching and employment work.

Decision tree algorithm
Decision tree algorithm is a typical technology of data mining. It can obtain valuable information by concluding and classifying data based on the attributes of data [14]. Applying decision tree algorithm in the analysis of employment information and obtaining relevant information affecting employment through construction of decision tree and extraction of classification rules is effective in predicting the future employment trend [15]. C4.5 algorithm from decision tree was used to process and analyze employment information.
C4.5 algorithm is an improvement of ID3 algorithm [16], which selects the node attribute of tree based on information gain ratio. Data set K was defined, including k data samples, and its class attribute was set as m values, corresponding to m categories ) , If i k refers to the number of samples in category i C , the amount of information needed by classification of a given object was called the entropy before division of K , and its computational formula was: The information gain of attribute A was: The larger the information gain in the set, the higher the purity of subset division.
The information gain ratio of attribute A was: represents the span and uniformity of split data set K of attribute A .

Data collection and preprocessing
The 2016 graduates of Henan University of Animal Husbandry and Economy were taken as research subjects.
The basic information, achievement information and employment information of the students were obtained from the student status management system, student learning management system and student employment management system, and 500 records were selected as samples.
There were many duplicate data or blank parts in the obtained data set, and the form of data was also not unified; hence preprocessing was needed.
(1) Data integration: The data exported from the three systems were integrated into a table of general information, and the attributes are shown in Table 1.

Name
Major Gender Academic performance Politics status English competence Student cadre Computer skills Participation in student society Employment unit Table 1: General information.
(2) Data correlation analysis: There were many irrelevant information in the data derived from the three systems, such as name, gender, politics status, student cadres, and participation in student society, which needed to be eliminated.
(3) Data conversion: Noise was eliminated from the data. In order to facilitate statistics and analysis, it was necessary to generalize the remaining five attributes, i.e., divide major into three categories, popular, general and unpopular, divide academic performance into excellent, general and poor, divide the English competence into CET4 and above and below CET4, divide computer skills into level 3 and above and below level 3, and divide employment unit into stateowned enterprise, private enterprise and others, represented by A, B and C.
(4) Data cleaning: Duplicate data and blank data were deleted from the data, and finally 420 records were obtained, 320 of which were used as training samples and the remaining 100 was used for testing.

Establishing decision tree
The training samples were analyzed by taking employment unit (A, B and C) as the labeling attribute and major, academic performance, English competence and computer skills as decision attributes. The number of students under different categories of different attributes is shown in Table 2.   (2), (3) and (4).
(1) Major Major was divided into popular, general and unpopular. When the major was popular, the entropy was: It was found from the above calculation results that the information gain rate of English competence was the largest. Therefore the attribute was regarded as the root node of decision tree. Then the information gain rate of every subtree was calculated according to the above procedures. Finally the decision tree in Figure 1 is obtained.

Generating classification rules
According to the decision tree in Figure 1, the following classification rules were obtained.
(1) If English competence = CET 4 and above AND Computer skills = level 3 and above AND academic performance = excellent AND major = general Then employment unit = state-owned enterprise (2) If English competence = CET 4 and above AND Computer skills = below level 3 AND academic performance = excellent AND major = general Then employment unit = state-owned enterprise (3) If English competence = CET 4 and above AND Computer skills = below level 3 AND academic performance = general AND major = general Then employment unit = private enterprise (4) If English competence = CET 4 and above AND Computer skills = below level 3 AND academic performance = poor AND major = unpopular Then employment unit = private enterprise (5) If English competence = below CET4 AND computer skills = level 3 and above AND academic performance = excellent AND major = popular Then employment unit = state-owned enterprise (6) If English competence = below CET4 AND computer skills = level 3 and above AND academic performance = general AND major = general Then employment unit = private enterprise (7) If English competence = below CET4 AND computer skills = level 3 and above AND academic performance = excellent AND major = general Then employment unit = private enterprise (8) If English competence = below CET4 AND computer skills = below level 3 AND academic performance = poor AND major = unpopular Then employment unit = others (9) If English competence = below CET4 AND computer skills = below level 3 AND academic performance = poor AND major = unpopular Then employment unit = others It was concluded from the above classification rules that English competence and computer skills had the  greatest impact on the employment units of students. Students with good English competence and excellent computer skills generally worked in state-owned or private enterprises, while students with poor English competence and weak computer skills, except for some students who had good academic performance or were major in popular subjects, did not work in the state-owned enterprises or private enterprises, which showed that schools need to strengthen the training of English and computer skills and pay more attention to these two aspects in the arrangement of teaching work and students themselves should strive to improve their English and computer skills and strengthen their competitive advantage in employment.

Testing of classification performance
The effectiveness of classification rules was tested through 100 experimental samples. Then the results were compared with the actual employment unit of students.
The testing results are shown in Table 3.

Sample Classification results
Actual results The classification results of 81 samples were the same with the actual conditions, and the classification of 19 samples was wrong; the accuracy rate was 81%. It indicated that the obtained classification rules were relatively accurate and could determine the employment condition of students.

Prediction of employment trend
After verifying the accuracy of the classification rules, the employment trend of graduates was predicted using the method proposed in this study. The 2018 graduates were taken as examples. The information about the major, academic performance, English competence and computer skills of the students were exported from the student status management system and the student learning management system. Then the employment trend of the graduates was predicted. The results are shown in Figure 2. Figure 2 demonstrates that the number of students who may be employed in private enterprises was the largest, accounting for 45%, while the number of students who may be employed in state-owned enterprises was the lowest, accounting for 21.56%. The decision tree and classification rules in this study could make a good prediction on the employment trend of graduates, help schools efficiently find the future employment direction of students, provide a strong basis for student employment guidance, and offer schools with valuable information.

Discussion
Employment has always been a problem that is difficult to be ignored and also can not be ignored in modern society, especially among university graduates. With the increase of the number of graduates, employment competition is becoming more and more fierce [17]. Employment is the most serious and difficult problem for graduates after they leave school and enter society, and it is also very important for schools. At present, all universities have employment guidance centers to collect and analyze the employment situation of students in order to find some rules and forecast employment. Employment prediction has great significance for graduates' employment and school teaching work [18]. However, with the increase of the number of university students and the accumulation of data, the analysis and processing of employment information is becoming more and more difficult, and it is difficult to obtain valuable information from mass data.
The development of data mining technology has brought about new changes. Decision tree algorithm is an efficient classification method, and it is also applicable in the prediction of employment trend. In this study, C4.5 algorithm which was relatively mature was selected. After obtaining the relevant data and information of graduates, four decision-making attributes, major, academic performance, English competence and computer skills, were extracted for analysis of employment units. The decision tree was established step by step after the calculation of information gain ratio of the attributes, and then classification rules were obtained through the decision tree.
The information gain ratio of major, academic performance, English competence and computer skills was 0.0131, 0.0134, 0.1656 and 0.1502, respectively. It was found that English competence and computer skills were the most important factors affecting the employment of graduates. In the process of employment, English competence and computer skills are the signs of graduates' ability. Many employers have specific requirements for the English and computer skills of employees. At present, schools have attached great importance to the cultivation of students' abilities in these two aspects. The extensive arrangement of English courses and computer courses has promoted the improvement of students' abilities to a certain extent. Under the rigid requirements, they have to strengthen the study of these two aspects. However, passive learning is not enough. The importance of English and computer skills must be fully recognized, which can be fully illustrated by classification rules. The extraction F. Yang of classification rules can help schools and students clearly understand what ability is the most important and crucial. On the one hand, it is conducive to the arrangement of school teaching and employment guidance; on the other hand, it is also conducive to students' active learning.
The testing of classification rules suggested that the classification rules obtained in this study had an accuracy rate of 81%, which showed that this method was feasible in predicting the employment trend of graduates. It was found from the employment trend prediction results of the 2018 graduates that many students will be employed by private enterprises and few students will be employed by state-owned enterprises. It indicated that schools should strengthen the output of talents to state-owned enterprises and carry out targeted talent training.
This paper preliminarily discussed the role of decision trees in college students' employment trend prediction, but there are still some problems that need further research: (1) more detailed division of employment units for college students is needed; (2) more factors that can affect college students' employment should be considered, such as family conditions, personal strengths, etc. For example, literature [19] points out that gender also can affect students' employment choices; (3) the possibility of the application of more data mining algorithms in the employment trend prediction of college students should be analyzed. For example, the Bayesian algorithm was used for employment prediction in literature [20].

Conclusion
The decision tree algorithm can help handle and analyze the employment situation of students and understand the main factors affecting the employment of students. This study constructed the decision tree and extracted the classification rules through the four decision attributes, major, academic performance, English competence and computer skills. It was found that English competence and computer ability had the greatest impact on students' employment. The test suggested that the classification rules in this study had an accuracy of 81% and was feasible in predicting the employment trend of graduates. There are many shortcomings in this study. For examples, more decision attributes which can affect the employment units of students can be mined, employment units can be further divided to obtain more detailed employment trend, and a larger sample size is needed for determining the accuracy of the method.

Acknowledgement
This study was supported by the Research Project of Humanities and Social Sciences of Education Office of Hubei under grant number 16Z015.