A Novel Borda Count Based Feature Ranking and Feature Fusion Strategy to Attain Effective Climatic Features for Rice Yield Prediction

An attempt has been made in the agricultural field to predict the effect of climatic variability based on rice crop production and climatic features of three coastal regions of Odisha, a state of India. The novelty of this work is Borda Count based fusion strategy on the ranked features obtained from various ranking methodologies. Proposed prediction model works in three phases; in the first phase, three feature ranking approaches such as; Random Forest, Support Vector Regression-Recursive Feature Elimination (SVRRFE) and F-Test are applied individually on the two datasets of three coastal areas and features are ranked as per the their algorithm. In the second phase; Borda Count as a fusion method has been implemented on those ranked features from the above phase to obtain top five best features. The multiquadratic activation function based Extreme Learning Machine (ELM) has been used to predict the rice crop yield using those ranked features obtained from fusion based raking strategy and the number of varying features are obtained which gives prediction accuracy above 99% in the third phase of experimentation. Finally, the statistical paired T-test has been used to evaluate and validate the significance of proposed fusion based ranking prediction model. This prediction model not only predicts the rice yield per hector but also able to obtain the significant or most affecting features during Rabi and Kharif seasons. From the observations made during experimentation, it has been found that; relative humidity is playing a vital role along with minimum and maximum temperature for rice crop yield during Rabi and Kharif seasons.


Introduction
Agriculture is the major source of livelihood for people in Odisha as well as India, but here it is said that 'Agriculture is the gamble of the monsoon'. Due the climatic changes the production of major yield is reduced in the Kharif. While Kharif rain fall over the country might be increased by 10-15%, but winter rain fall is expected to de-crease by 5-25% and seasonal variability would be further compounded [1].
It is highlighted that, due to heavy temperature, including water shortage, distribution of rainy days, maximum loss is expected in Rabi crops and the productivity of Rabi crops is decreased from 10% to 40% by 2100 [2]. Rice yield is expected to decline by 6% for every 10°C rise in temperature [3]. The scientific and policy personnel have accepted the susceptibility of agriculture crop to climate change and raised question the capability of farmers to adapt because of the direct and strong dependence of crop agriculture on climate [4]. There are different forecasting methodologies available and evaluated by the research workers all over the world in the field of Agriculture. On all India basis, the imitation study developed shows that the yield of rice crop is affected by weather change from 2.5 to 12% [5]. The rice is the main food in eastern India specifically in the states of Odisha, West Bengal, Jharkhand and Bihar. In India green revolution is mainly Wheat as contributed states was mainly Punjab, Haryana and UP. So, Government of India is expecting the 2 nd green revolution from eastern India. The amount of data set is very large in Indian agriculture. Earlier, the different model form dataset was done only by manual system, when there was no outset of computer. But with advancement of computer technology, collection of huge data, their classification and storage has been increased. This has established enormous improvement in pattern perception. In this paper, the main focus to develop a user friendly network for farmers which provide the study of rice production on the basis of important climatic parameter.
The current age is the age of data. As we are taking the large dataset for accuracy of the result, so for modeling of the dataset the feature selection technique becomes the prerequisite method [6,7]. To increase the correctness level of the experiment we have to increase the attributes of the training examples that is the dataset [8,9,10]. As the knowledge discovery technique is finding the knowledge from the vast amount of data, so it is dare to do future research for solving the real world troubles. Ranking is a method to find a rank between all the features according to their importance. Selecting a least number of features produce a simple model, this will take less time for computation and can be understood easily. Due to the simpler model fewer resources also required, which can be affordable. Now the question is how we can rank the features or variables [9,10,11,12,13,14]. There are so many algorithms in machine learning to find the significant variables. Thus, the concept of feature selection or variable selection arises. It is the selection of the variables or selecting the subset of the variables and this technique does not change the original illustration of the variables.
During the application of the various feature ranking techniques on the dataset, on each iteration small subsets are being generated. For each feature, there is a rank order of the result of each run and then united with the earlier runs to form an ensemble [15,16]. The Monte Carlo algorithm states that an conclusion can be achieved by the combining random consecutive rough calculation to the same result [17]. This method stimulated the ensemble method.
As agriculture is the backbone bone of Indian economy and rice is the main staple food, so the prediction of rice and the timely advice on variation of climatic condition for the farmers is required. This factor motivate us to pre-pare a computational model for the farmers and ultimately to the society also. The main aim of this work is to prepare a computational model to find the feature affected most for the rice production. Here we have used three different feature ranking methods such as Random Forest [18,19,20,21,22,23], SVR-RFE [24,25,26] and F-Test [27,28] for regression. These are mainly used for ranking of genes in gene expression datasets. The same methods are used here to rank the features of rice crop prediction datasets. Three ranking algorithms gave three different ranks to each feature of the dataset. Then, a feature fusion method has been proposed to evaluate the final rank of each feature and then, these newly ranked features are evaluated by Extreme Learning Machine (ELM) [29,30,31,32] based regressor to measure the importance of each feature. The accuracy of ELM-Regressor has been calculated by decreasing one by one feature from the dataset. Finally, the comparison between proposed fusion based ranking strategy and non fusion based ranking strategy has been made to obtain the number significant features contributing towards the maximum accuracy of regressor. These features decide the importance of climatic parameters in rice crop production both for the Rabi season and Kharif season in the collected districts namely, Balasore, Cuttack and Puri. Thus the important finding of the study is temperature and humidity affect mostly for the crop production in the coastal district of Odisha.

Study area
In the Figure 1, the rice crop production dataset of three districts such as: Balasore, Puri, Cuttack are shown [33]. The production of rice is mainly in two seasons, such as: Rabi and Kharif. There are different features considered for this production, such as: rainfall, minimum and maximum temperature and relative humidity in the morning and afternoon hour. To avoid the inconsistency in the dataset there are various methods for missing value [36] imputation. In this paper mean value used to solve the missing value problem.

Goal
Considering the typical data available in the above mentioned section, the use of data mining or machine learning strategies should be able to produce a natural decision for crop production based on the important or significant climatic parameters which affects the yield of rice during both the Rabi and Kharif seasons. This paper mainly focuses on the capabilities of ranking and fusion strategies, on two aspects such as; feature ranking and fusion of those ranked features. Specifically, the goal of this study can be outlined as follows: (a) Collection of climatic data of rice yield for both the Rabi season and the Kharif season of three coastal areas of Odisha, a state of India.
(b) Feature importance evaluation and selection; (i) Ranking of features by applying various ranking strategies.
(ii) Fusion of those ranked features.
(c) Selection important climatic features derived from the ranked and fused features.
(d) Model tuning or searching for appropriate algorithm parameters for better performance.
(e) Model evaluation and validation through performance comparisons and statistical validation.

Paper layout
The rest of the paper is outlined as follows; the related work in this field is discussed in Section 2. The diagrammatic representation of proposed regressor has been detailed in Section 3. The methodologies such as Random Forest, SVR-RFE, F-Test and ELM regressor and various fusion strategies are discussed in Section 4. The experimentation and model evaluation is discussed in Section 5 and Section 6 discusses the principal findings obtained from this study. Finally Section 7 concludes the paper with future scope of this work.

Literature survey
To contextualize the effect of goals set and discussed in Section 1.2 in rice yield modeling, many papers were selected for review which are based on machine learning or data mining techniques be useful for modeling in this serial; (a) ranking of features based on Random Forest, F-Test and SVR-RFE (b) fusion strategies for feature selection and; (c) model evaluation and validation for proper classification. This section explores the various works done on prediction on agricultural field based on random forest, F-Test and SVR-RFE etc. SML Venkata et al. [35] used the dataset consisting of rainfall, precipitation and temperature and applied random forest which is the collection of decision trees, on the two-third of the records and then the resulting decision trees are applied on the remaining records and lastly for the prediction of the crop data, the resultant training sets applied on the test data based on the input attributes. They have used R Studio and they evaluated their results by using other performance measures. Evathia E et al. [18] modified the structure and selection mechanism of the random forest algorithm to improve the prediction performance. Authors have verified all the evaluation measure and basing on the feature selection, clustering etc, they have done the voting procedure. The main objective of their work was the combination of the construction and voting method of random forest algorithm. They found the positive effect on the performance by using 24 datasets. Hari Dahal et al. [36] took six soil variables with crop yield data to find the level of crop productivity. They found some of the soil variables have extremely correlated. So to estimate the potency of the relationship they developed the multiple regression models and applied F-Test to know which variable is most significant and found that total nitrogen, organic matter and phosphorous affect the yield of paddy. J. P. Powell et al. [37] analyses the various weather events on the crop winter wheat taking the data on the farm based and of 334 farms for 12 years. They have used the F-Test to find the significance of weather events in the model. They observed and concluded that, the effect of weather events on yield is time specific and also found that the high temperature and precipitation events significantly decrease yields. Ke Yan et al. [24] studied both the linear and nonlinear SVM-RFE algorithm. They have analyzed the correlation bias and anticipated a new algorithm such as, SVM-RFE+CBR. They have implemented in the synthetic dataset. Lastly they found the accuracy on their proposed method. Meng-Dar Shieh et al. [25] proposed one method to eliminate the problem of choosing the features subset. Shruti Mishra et al. [26] recommended one extensive deviation of SVM-RFE and SVM-T-RFE. They found the maximum accuracy in case of classification taking the less subset of gene sets and also of high dimensional data. They have also compared with other two methods such as SVM-T-RFE and SVM-RFE and conclude that the projected step by step method is 40% better than SVM-RFE and 25% better than SVM-T-RFE. The ranking strategies adopted by the above mentioned authors have motivated us to carry forward our research on agricultural and climatic datasets.

Schematic representation of proposed method
The feature ranking methods are mainly used to rank the features. In this study, a revolutionary effort based on feature ranking methods to find the significant climatic features which affects mostly on the yield of rice of the three coastal districts of Odisha for both the season such as :Rabi and Kharif have been introduced. This empirical study mainly focuses on the selection of significant features through feature ranking and feature fusion based strategies. It works in three important phases, in the first phase known as feature ranking, Random Forest, SVR-FRE and F-Test based regression methods are explored to rank all the features of the datasets, then in second phase, new ranks have been evaluated by considering all the ranked features from above mentioned ranking techniques and finally, ELM based regressor has been used to empirically evaluate and validate the yield modeling. The Figure 2 illustrates the flow of implementation of proposed ELM based regressor model to obtain the important features that contribute to the yield of rice production in the coastal areas of state of Odisha.

Data set description
The dataset D is composed of Odisha district of India (Figure 1). Let d i ∈ D ∀i = 1, · · · , 31 features that is 31 years of data. where |d i | = 25 features that is represents the attributes of the datasets. Different parameters are, such as p = {maxtemperature, mintemperature, rainf all, humidity} that effect the rice production. Since, there are two types of rice production seasons such as; Rabi and Kharif produced between months 'January-May' and 'June-December', hence p i is collected over the range of six months each resulting 24 set of attributes and 25 th attribute is the production in hector of crops for particular year.
The rice production graph for those three coastal areas of Odisha from the year 1983-2014 is shown in Figure 3(a) and Figure 3(b) for Rabi and Kharif season respectively. The detail description of datasets with standard deviation (Std. Dev.) for three areas is shown in Table 1.
The range and average values of the parameters such as; rainfall in mm/hector, maximum and minimum temperature in°C, mean relative humidity both at 8.30 am and 5.30 pm, of all three datasets with respect to three coastal districts are shown in Table 2 for Rabi and Kharif seasons.

Study procedures
This section presents a usable scheme to predict the effect of climatic parameters for rice yield in the coastal areas of a state of India, Odisha, during both the Rabi and the Kharif season. These steps are narrated as follows: • Collection of the raw data including climatologic characteristics and rice production per hector.
• Calculating the range and average of parameters of those datasets for proper knowledge about the features.
• Defining the attributes affecting the rice yield.
• Redefining the datasets and constructing the database of all tuples according to the selected attributes.
• Dividing the raw data into training and testing datasets.
• Designing the feature ranking models to rank all the features of individual datasets for further processing.
• Designing a feature level fusion model using Borda Count to generate a new set of ranked features by taking the ranked features from all three feature ranking strategies for further analysis.
• Designing an ELM based regressor to classify the datasets with the newly ranked features to measure the importance of each feature.
• The accuracy of ELM regressor has been calculated using by R2 score decreasing one by one feature from the datasets.
• Finally, with respect to maximum accuracy, top 5 ranked features are selected, which decide the importance of climatic parameters in rice crop production both Rabi and Kharif in three different districts.
• Finally, with respect to maximum accuracy, top 5 ranked features are selected, which decide the importance of climatic parameters in rice crop production both Rabi and Kharif in three different districts.

Methodologies adopted for experimentation
This section discusses the various methodologies such as random forest; F-Test and SVR-RFE used for feature reduction and ELM for classification are discussed in this section.

Random forest
Random forest or Random Forest is one of the most important and popular supervised learning algorithm. It can be used both for classification and regression tasks. In this case multiple trees are grown. Then for the classification of a new object based on the attributes, a classification is given by each tree and that is the tree 'votes' for that class.
The most votes over all the trees in the forest are chosen for classification and average of outputs by different trees in case of regression. Random forest is one of the ensemble methods of decision trees. Breiman proposed random forest where he adds an extra layer of randomness to bagging [19]. Random forest has a vast number of applications due to its good constancy and simplification [19,20,21,22,23].

F-Test for regression
The F-Test for linear regression is one of the methods to know the significance of any variable among the independent variables in a multiple linear regression. How the null hypothesis can be can tested in a multiple regression model with intercept can be described by the F-Test for regression [27,28].

Support vector regressor-recursive feature elimination (SVR-RFE)
SVR-RFE is one of the variable selection or feature selection method. It is an optimization method for finding the best performing feature set. Repeatedly it creates models taking features subset and next with left features and lastly it ranks the features on the basis of order of elimination [24][25][26]. First the algorithm is trained by SVM with a linear kernel and then the features are detached recursively using the smallest ranking criterion. In order to generate a rank the weight vector needs to be calculated as given in Equation (4).
Where, i is the number of features ranging from 1ton; β i is the Lagrangian Multiplier estimated from the training set;

Extreme Learning Machine (ELM)
Artificial Neural Network (ANN) is one of the best examples of classification and regression technique which works on back-propagation method. In this case weights are adjusted by trial and error methods. But there are various disadvantages of ANN, such as; local minima, over fitting problem and large training time [38][39][40]. To overcome the problem of memory requirements, Hung et al. [29] projected new method which is based on the least square algorithm for classification and regression problem, known as ELM. ELM also has unique minimum solution, with both smallest training error and smallest weight norm, does not need a stopping methods.
ELM is a learning neural algorithm, introduced to develop the efficiency of Single Layer Feed Forward Neural Network (SLFN). This section will briefly explain the Algorithm 1: SVR-RFE [ [21,22,23] Input: Initial feature subset, F = {1, 2, · · · , n} Output: Rank list according to smallest weight criterion, R. 1 Set R = {} 2 Repeat 3 -8 until F is not empty 3 Train the SVM using F . 4 Compute the Weight Vector using (1) 5 Compute the Ranking Criteria, Rank = W 2 6 Rank the features as in sorted manner, working principle of ELM [30,31,32]. N is given as a training sample, where (X i , Y j ) ∈ R n × R m . Here, j = 1, 2, · · · , N and the number of hidden nodes is considered as M . Representing the output of SLFN, the equation is formulated in (5).
Where, with respect to the input sample, the output vector is output k and f (X k ; a j , b j ) is the activation function. a j and b j are the randomly generated learning parameter of the k th hidden node and (5) can be compactly written as Here, Where,H is the output matrix, (2) can be linear system by analytically determine the output weights by finding the least square solution, which is defined in (3) Where, trainoutput is the output of the training data and the benefit of the ELM is that, the output weight is systematically calculated by using some mathematical transformation, avoiding the lengthy process of training and simultaneously no iterative adjustment of the training parameter is required.

Fusion strategies
The Borda Count [41,42] is one of the superior voting system. In this case the voters rank the candidates according to the inclination. Then the points are formed from ranking. The candidates which will gate score one point then ranked last, then score two and next-to-last and so on. Who will secure the more points then declared as winner. There are various other standard voting systems such as: Alternative vote and the single transferable vote, but the advantages of Borda count are, all the MPs have the support of a majority of their votes. The parties nominate the good one. This method is a kind of group consensus functions which maps the inputs of individual rankings to a combined form of ranking which leads to a most appropriate and relevant decision making process. With respect to machine learning, Borda Count is defined as a sum of number of classes ranked below the class by each classifier. The degree of the Borda Count reflects the level of agreement that the input pattern belongs to the considered class. The main advantage of this method is to implement and does not require any training.

Validation strategies adopted
R 2 is one of the statistical compute to find the fitness of the regression line with the data [43]. Some knowledge regarding the goodness of fit of a model can be defined by this statistic [35,36]. A linear model explains the proportion of response variable variation and values of R 2 always lie between 0 and 100% or 0 and 1, where; 0% or 0 indicates that the model explains none of the variability of the response data around its mean and 100% or 1 indicates that the model explains all the variability of the response data around its mean and this statistics measure of how well the regression predictions approximate the real data points. An R 2 of 100% or 1 indicates that the regression predictions perfectly fit the data.

Experimental setup
In this work all the implementations have been carried out using python programming environment in Linux operating system with a minimum hardware configuration of 4GB RAM and 100GB hard disk. First of all, the different activation functions are tested for best suitability to our prob-lem domain. Then, different feature ranking strategies have been tested with ELM. Finally, the proposed fusion of feature ranking has been tested. The parameters used for experimentation is illustrated in Table 3.

Parameters used
The Table 3 gives the details of the parameters used for the implementation.

Feature ranking methods
Here three different feature ranking methods such as Random Forest, SVR-RFE and F-Test have been experimented for regression. In literature, it has been found that, these are mainly used for ranking of genes in gene expression datasets and in this study; the same methods are used to rank the features of rice crop prediction datasets. This methodology works in three different steps such as; (a) first, the three ranking algorithms outputs three different ranks to each feature of the dataset; (b) secondly, a feature fusion method based on Borda Count has been used to evaluate the final rank of each feature and; (c) finally, these newly ranked features are evaluated by ELM based regressor to measure the importance of each feature. The accuracy of ELM regressor has been calculated by decreasing one by one feature from the datasets. Finally, with respect to maximum accuracy, top five ranked features are selected, which decide the importance of climatic parameters in rice crop production both for the Rabi season and the Kharif season in all the districts taken for the analysis. Figure  4 and Figure 5 shows the features are arranged in the descending of their R 2 scores measuring the importance of the features after applying the Random Forest feature ranking method on both Rabi and Kharif seasons respectively for Balasore, Cuttack and Puri districts. From Figure 4 for  Figure 5 it can be seen that, the feature 5 is showing highest importance score of 8 and the feature 5 is having the lowest score of importance and rest are lying within the range of 2-6 scores for Balasore district. For Cuttack district, features 1, 24, 8, 9, 23, 14 and 7 are having approximate importance scores from 0 to 7, rest other features are having very less importance scores. Similarly, for Puri district features 8 and are having very high importance with the scores 0 to 16, and 4, 10 and 9 are having moderate scores. Rest others can be ig-nored due to their very less scores of importance. Figure  6 and Figure 7 shows the features with respect to their R 2 scores measuring the importance of the features after applying the SVR-RFE feature ranking method on both Rabi and Kharif seasons respectively for Balasore, Cuttack and Puri districts. From Figure 6 for Rabi season it can be observed that, the feature 23 is having the 1st rank, then features 15, 9, 21 and 14 are showing better rank and few more are showing moderate rank and feature 4 is having the lowest rank giving rise to non-significant feature. The feature 7 is having the highest rank, and feature 17 is with lowest rank in Cuttack district. Similarly, the feature 19 has very high rank and features 17, 11, 15, 23 are having better rank and feature 4 has less importance in Puri district. Similarly, in Figure 7, the feature 27 is experiencing the highest rank, feature 25 and 9 is next to best and feature 0 (first feature) is having less rank with less impact of the feature in Balasore district. For Cuttack district feature 16 is of great importance and feature 34 is of no or less importance, therefore can be ignored. Feature 29 is showing the highest rank and 23, 9, 8 and 20 features are also experiencing better scores, but feature 33 is with the lowest rank in Puri district. The importance of features for both Rabi and Kharif seasons using F-Test for regression has been plotted in Figure 8 and

Fusion of feature ranking methods
Here, a multiple ranking fusion scheme has been proposed. In this scheme, the individual rankings using different ranking methods have been obtained and then those ranked features are combined to obtain the final rankings of features. The most popular and effective method for fusion used here is Borda count method.
Mathematically, the fusion of features based strategy can be proposed as; let the dataset is defined as DS = {x 1 , x 2 , x 3 , · · · , x n }, where x 1 , x 2 , x 3 , · · · , x n represents n number of features of the dataset and r 1 , r 2 and r 3 are three ranking methods used and the proposed fusion of ranking strategy can be described as shown in Figure  10. The importance of features for both Rabi and Kharif seasons using fusion of ranking strategy for regression has    been plotted in Figure 11 and Figure 12 respectively and the five top ranked features obtained are listed in Table 4.

Extreme learning machine regressor
In this work, first, all the variants of ELM regressors have been evaluated with different activation functions such as; tanh, sine, tribas, inv-tribas, sigmoid, hardlim, softlim, gaussian, multiquadric, inv-multiquadric etc. Among these functions it has been observed that, tribas, inv-tribas, hardlim, softlim and Gaussian functions gives a negative value of R2 score and score of tanh, sine, sigmoid, multiquadric and inv-multiquadri functions are found to be ≥ 98% as detailed in Figure 13 and Figure 14 and also Table 5 and Table 6, shows the graph for R2 score for different activation functions for ELM to predict Rabi and Kharif rice crops respectively. From all those ten activation functions multiquadric is having the highest R2 score while considering all the districts for Rabi and Kharif seasons. Hence, for the experimentation, mutiquadric function has been considered.

ELM-Regressor for varying number of features
Once, the newly ranked features are obtained from proposed feature fusion strategy and the activation function (multiquadratic) have been also found to be used by ELM, now the accuracy of ELM Regressor has been calculated by decreasing one by one feature from the datasets as shown in Figure 15 and Figure 16.

Result analysis
After obtaining the top five ranked features and the varying number features which give above 99% prediction ac-(a) Balesore (b) Cuttack (c) Puri Figure 9: Feature ranking using F-Test for Regression for Kharif season in three districts.  with multiquadratic based ELM to find the impact of fusion based strategy with non-fusion based ranking strategies for the maximum number features that contribute to achieve  99% prediction accuracy as shown in Table 9 and Table 10 for Rabi and Kharif season crops. For Rabi season crop from Table 9, it can be seen that, proposed fusion based ranking strategy when compared with non fusion based strategies, the maximum number of features that contribute predictive accuracy above 99% for ELM with Random Forest is 7, 10, 6; ELM with SVM-RFE is 5, 9, 4 and similarly ELM with F-Test needs 9, 11, 8 numbers of features to give 99% and above predictive accuracy. While with a very less number of features such as; 3, 5 and 2 can predict above 99% accuracy for Balasore, Cuttack and Puri districts respectively. From Table  4, where the top five ranked features extracted from fusion strategy, it can be concluded that he crop yield for Balasore district in Rabi season can be accurately predicted if we consider only three features out of RH at 5.   Table 10 for Kharif season for all the district datasets, the observation says, Kharif season crops needs more parameters or features to be considered in comparison to Rabi season crops which is evident from Table 8 and Table 10. The top 15, 5 and 5 ranked features are need to accurately predict the rice yield during this season for Balasore, Cuttack and Puri districts respectively. Observing from Table 4, it can be established that, for Balasore district 15 numbers of features are affecting

Statistical validation
Paired T-test is one of the methods, to assess the consequence of the proposed fusion of feature ranking approach. The outcome produced by ELM-SVR-RFE was compared with proposed approach for five independent runs considering top five ranked features. Here, only ELM-SVR-RFE for statistical validation has been considered for paired test, as it gives better result than the other basic feature ranking based methods. There is no difference found between the outcomes of the two methods that the null hypothesis was the case. The outcomes shown both for the Rabi and the Kharif seasons respectively in the Table 11 and Table  12. From the below tables we can see that, the null hypothesis is rejected and average p-value is 0.0023, 0.0021, 0.0044 for the taken three districts such as: Balasore, Puri and Cuttack of Rabi season and 0.0335, 0.0221 and 0.0450 for Kharif season of all three districts such as: Balasore, Puri and Cuttack. We can observe that the values are closer to zero and for this reason the arguments are strengthened and the projected fusion of feature ranking approach has improved performance than the other only feature ranking based methods.

Discussion on principal findings
The principal aim of the present study is to discover the features those have important role or affects mostly in rice crop production both for the Rabi and Kharif seasons of Balasore, Cuttack and Puri. To obtain our desired result, a fusion based strategy based of feature ranking methods has been proposed and explored. This methodology works in three computational phases and not only finds the most significant features contributing towards rice yield but also shows 99% and above prediction accuracy. According to the results obtained the following are few observations made on this study: • First, the raw data including climatologic characteristics and rice production per hector are collected for three districts and two seasons and the range and average of parameters of those datasets are computed to have a greater insight about the features for proper understanding.
• The importance of features have been evaluated and those features are selected for prediction of rice yield using, ranking of features by applying Random Forest, SVR-RFE and F-Test ranking strategies. These feature ranking models, rank all the features of individual datasets for further processing.
• A feature level fusion model using Borda Count has been explored to generate a new set of ranked features by taking the ranked features from all three feature ranking strategies for further analysis. From this, top five ranked features contributing mostly for rice yield have been listed in Table 4.
• Multiquadratic activation has been confirmed from ten activations functions based on R2 score to be used by the ELM regressor to obtain the rice yield prediction above 99% predictive accuracy by decreasing the features one by one for two seasons and three district datasets and results are shown in Table 7 and Table  8.    • Again, the performance comparison of proposed feature ranking based fusion strategy with feature ranking based methods for Rabi and Kharif seasons crop prediction are done to obtain the minimum number of features contributing towards rice crop yield and shown in Table 9 and Table 10. From those tables, it can be concluded that, the features affecting mostly for rice yield are RH during 8.30 AM and 5.30 PM for all three districts taken during both the Rabi and Kharif season and also the minimum temperature plays a vital role.
• The paired T-test was used to calculate the importance of proposed fusion of feature ranking approach. The outcomes found by ELM-SVR-RFE were compared with proposed approach for five independent runs considering top five ranked features. Here, only ELM-SVR-RFE for statistical validation has been considered for paired test, as it gives healthier result than other basic feature ranking based methods.
• It can be observed from Table 11 and Table 12 that, the null hypothesis is rejected in case of Rabi season for all the three districts such as: Balasore, Puri and Cuttack and for three districts of Kharif season, as the values are closer to zero, which strengthens the argument that, proposed fusion of feature ranking approach has improved performance than the other only feature ranking based methods.

Conclusion and future scope
In this study an attempt has been made to obtain the climatic effect on rice yield of coastal areas of Odisha. The fusion based strategy is the novelty of this work. This prediction model not only predicts the rice yield per hector but also able to obtain the significant or most affecting features during Rabi and Kharif seasons. This methodology works in three phases, in the first phase, three feature ranking approaches such as; Random Forest, SVR-RFE and F-Test has been applied on the three two datasets of three coastal areas and features are ranked as per the their algorithm.
In the second phase, Borda Count as a fusion method has been implemented on those ranked features from the above phase to obtain top five best features. Then in the third phase, multiquadratic based ELM has been used to predict the rice crop yield using those ranked features obtained from fusion based raking strategy of second phase. After applying ELM with fusion strategy, it is seen that by taking at least 3 features for Balasore, 5 features for Cuttack and 2 features for Puri we can get the accuracy of 99% where as in each individual ranking method with ELM we have to take more features. Finally, the statistical paired Ttest has been used to evaluate and validate the significance of proposed fusion based ranking prediction model. From the observations made during experimentation, it has been found that; relative humidity and in some case temperature also is playing a vital role for rice crop production both for the Rabi season and the Kharif season. However, in future, the not linked or inconsequential factors can be later dealt with by working on optimized strategies.