New Re-Ranking Approach in Merging Search Results

; or the content of the document itself. Research is aimed at improving the evaluation criteria such as accuracy, recall, data usage savings, response speed and bandwidth usage. The innovation in this paper is using machine learning techniques and basic information returned from the original search engine for re-ranking. We propose solution of sequential mixing to balance the speed and quality of the results. The rest of this paper is organized as follows. In the Session 2, we present an overview of re-ranking and focus on previous efforts on techniques of re-ranking as well as our analysis and remarks on pervious methods. Details of our proposal in using genetic programming for the reranking are presented in Section 3 and the experiment is presented in Session 4. We conclude important points in Section 5. 2 Overview on re-ranking 2.1 Ranking and re-ranking In the information query, the ranking is usually done by calculating the score of fit between the document and the query, serving the goal of creating a list of documents in decreasing order of the score (shows the degree of suitability for user requirements). After executing the initial query and receiving the results from a search engine, the data can be extracted including the query content itself, the text list, the ranking points corresponding to the text (some may be hidden from the user), some basic content for each text, such as title, abstract. On an interactive system, the search is performed repeatedly, and the system can store and analyse the contents of executed queries, found documents, read texts, declarations or manipulations by users. The above information may be exploited by the system to re-rank the result list in a variety of ways, distinguished by the type of data used as using the information of the available search engines, rating, or considering to user information. Figure 2: Mix model for search results. Merging search results from multiple sources has the following process (Figure 2): The central server Sc receives the query from the user, sends the query to search servers from S1 to Sm. From each Si server, the list of Li contains N best results created and returned to the central server. Sc re-evaluates the documents based on the content returned from the original search servers or the content themselves to create the final result list returned to the user. 2.2 Techniques of re-ranking 2.2.1 Combination available rating The simplest method to merge ranking results is RawScore, which directly uses the rankings in each of the original search result listings [4]. The CombSUM method proposed by Fox and Shaw, takes the total score of the document in the various search engines to determine the CombSUM score for a document.


Introduction
In the Internet, search engines like Google, Bing, Yahoo provide a convenient mechanism for users to search and exploit information on the Web. According to statistics of "Surface Web" in 2017 1 , it shows that Google indexes about 50 billion web pages, Bing about 5 billion pages. The "Surface Web" is only about 1% of the "Deep Web"which is not indexed by popular search engines. Many websites do not allow search engines to crawl, instead offering themselves a separate query system such as PubMed or the US Census Bureau.
However, when searching on search engines such as Google, Yahoo or Bing users are not satisfied for two reasons. Firstly, each search engine has different corpus, searching and ranking methods so the returned results will be different. Secondly, search engines now perform monolingual searches (search only on the corresponding language for search keywords), so users can not find webpages in other languages.
To help users exploit the information effectively, there are some tools that combine search results from various sources. We can improve search results based on the available search engines by building a Meta Search Engines [1]. The nature of Meta Search Engines is to use techniques to exploit existing search engines and to process the results obtained from these search engines to generate a new search result that better matches user requirements. A Meta Search Engine needs to handle a variety of issues such as query processing, search on available search engines, processing returned results, reranking results found, and display results for users. In this study, we focused solely on re-ranking the results found by the search engines available.
There are two approaches to solve the problem. The first is to mix the search results (duplicate documents) of different search engines on the same information space. This method is often applied to "Surface Web". The second is to combine search results from independent sources (Federated Information Retrieval -FIR) [2], more in line with the exploitation of "Deep Web" information.
The research and development of a combination of search results from multiple sources focused on three main V.T. Hung issues: server description, server selection, and merging [1]. Server description is intended to estimate general information about the original search server such as the number of documents, terms; Frequency of search results returned, ... Server selection is made based on the server description information to determine the most suitable server to send the query. Mixed results are the main work of combining search results from multiple sources, evaluating, rearranging documents, creating final list of results returned to the user.
Merging techniques can be distinguished based on the types of information used for evaluating, re-ranking search results from sources [3]: server information search (total number of documents, results returned); Statistical information: the rank order of the document, the rating provided by the originator; basic information (title, abstract); or the content of the document itself. Research is aimed at improving the evaluation criteria such as accuracy, recall, data usage savings, response speed and bandwidth usage.
The innovation in this paper is using machine learning techniques and basic information returned from the original search engine for re-ranking. We propose solution of sequential mixing to balance the speed and quality of the results.
The rest of this paper is organized as follows. In the Session 2, we present an overview of re-ranking and focus on previous efforts on techniques of re-ranking as well as our analysis and remarks on pervious methods. Details of our proposal in using genetic programming for the reranking are presented in Section 3 and the experiment is presented in Session 4. We conclude important points in Section 5.
2 Overview on re-ranking

Ranking and re-ranking
In the information query, the ranking is usually done by calculating the score of fit between the document and the query, serving the goal of creating a list of documents in decreasing order of the score (shows the degree of suitability for user requirements).
After executing the initial query and receiving the results from a search engine, the data can be extracted including the query content itself, the text list, the ranking points corresponding to the text (some may be hidden from the user), some basic content for each text, such as title, abstract. On an interactive system, the search is performed repeatedly, and the system can store and analyse the contents of executed queries, found documents, read texts, declarations or manipulations by users. The above information may be exploited by the system to re-rank the result list in a variety of ways, distinguished by the type of data used as using the information of the available search engines, rating, or considering to user information. Merging search results from multiple sources has the following process ( Figure 2): The central server Sc receives the query from the user, sends the query to search servers from S1 to Sm. From each Si server, the list of Li contains N best results created and returned to the central server. Sc re-evaluates the documents based on the content returned from the original search servers or the content themselves to create the final result list returned to the user.

Techniques of re-ranking 2.2.1 Combination available rating
The simplest method to merge ranking results is Raw-Score, which directly uses the rankings in each of the original search result listings [4]. The CombSUM method proposed by Fox and Shaw, takes the total score of the document in the various search engines to determine the CombSUM score for a document.
= ∑ ∈ with IR Servers as the set of search engines, scorei is the point of the document assigned by the i th search engine.
The score assigned by a search engine can be normalized to a NormalizedScore score to avoid differences in searcher norms: = − − with MinScore and MaxScore being the smallest and largest values in the score of all documents assigned by the search engine.
The weakness of this method is the difference of search engines quality on ranking quality, scoring, presentation methods, ... To overcome the limitation, we can add a weighting for search engines. The WeightedCombSUM score for a document is calculated by the formula: ℎ = ∑ ∈ × Here, wi is the weight assigned to the search engine i in the set of search engines IR Servers; NormalizedScorei is the normalization of being assigned by server i to the document as in the formula of NormalizedScore.
Similarly, some studies [5] suggest a linear function combining the ratings of search engines of the form: Here M(d, q) is the final ranking point, si(d, q) is the ranking (normalized) of the search engine i, is the weight assigned to the search engine i. The limitation of these methods lies in the need to identify values by manual methods or based on observation of training data.

Ranking order information
The second solutions group uses ranking order information in the original search list. The Round Robin method [6] is the simplest method of mixing, which is performed as follows: We have the result list which is returned from L1, L2, ..., Lm; Firstly, we get the m first result as R1 from the list of Li, then take the m second result is R2 from the list of Li and so on. The final result of the mixing process is in the form of L1R1, ..., LmR1, L1R2, ..., LmR2, ... This is the right solution to ensure search speed when the source of quality information equivalent.
Borda mixing method [7] uses expert judgment scores. Each expert ranked a number of c documents. For each expert, the top document is c, the second document is c-1 and so on. If there are some unrated documents, the remainder is divided equally among all unrated papers. Finally, the materials are ranked according to the total number of points assigned. Blending methods use useful ranking information in the absence of information about the search engine rankings. However, studies show that this method of mixing is not as effective as the combination of scores.
The LMS method (using result Length to calculate Merging Score) introduces the original search server counting formula based on the number of returned documents, then identifies new points for documents by multiplying the server point by original point [8].

Ranking learning
In a local search system, documents can be indexed in a variety of ways such as VSM, LSI, LMIR, ... The score of a document versus a query in different ways can be considered as different attributes of the document. Current information query systems tend to apply machine learning techniques to model or create ranking formulas based on these attributes.
The learning process consists of two steps: training and testing. The training input is D consisting of the set {<q, d, r>}, where q is the query, d is the document represented by the list of attributes {f1, f2, ..., fm}, r is the relevancy of the document d versus the query q. The training step involves the construction of an F rating model, based on a training database that determines the relationship between the attributes of the document and the relevance of the document to the query. At the test step, the ranking model applied to the T-dataset is made up of the set {<qtest, dtest, rtest>}, the rpredict value is the dtest document relevancy for the qtest query. -calculated by the F-rating model -will be compared to the rtest value for the rating quality of the rating model. Data for training D and experimental data T are usually generated by editing the search results in practice, and then manually evaluated by experts.
Ranking methods generally have the same approach by optimizing the objective function: find the maximum value of the gain function or find the minimum value of the loss function.
Ranking techniques are divided into three groups: point-wise, pair-wise and list-wise [9]. With a point-wise approach, each training object corresponds to an assigned document attached to the rating value. The learning process involves finding a model that maps each object to a rating close to its actual value. The pair-wise approach utilizes pairs of documents that are associated with rank order (before or after) as training subjects. In the list-wise approach, the training object is itself the list of ranked documents corresponding to the query.
The characteristic of the point-wise solution group is PRank introduced in [10] using a regression analysis.
In the pair-wise group, they constructed the RankSVM ranking algorithm with the aim of minimizing bias in the list of sorted pairs. This method is often referred to in studies as a basis for comparison. Freund applies boosting and introduces the RankBoost algorithm [11]. The advantage of this approach is that it is easy to deploy and can run in parallel for testing. Another example is FRank based on the probability ranking model.
In the ListNet method of the list-wise group, the document list itself is considered a training subject. The authors use a probability method to calculate the loss function for the list, which is determined by the difference between the expected sorting list and the correct sorting list. Neural network models and gradient descent are used in deployment algorithms to determine the ranking model.
While the presented methods may apply to mixing results from multiple search engines, the ranking learning methods apply to the case of the search system. Kits and documents are indexed in different ways. According to Yu-Ting and colleagues [12], ranking methods with training data (referred to as supervised ranking) were evaluated more effectively than others one (may be considered non-supervisor ranking).

Using user information
By default, traditional web search engines perform keyword-based queries. However, two different users, with different interests, can use same keywords with different search goals. In order to better meet the individual user's search needs, the user's declaration of behaviour and habits of the user during the search operation has become a research object. personalized ranking results or cooperative ratings [13].
Personalization of rankings results in querying and ranking results for users based on individual user interests and is carried out through two processes: (1) The V.T. Hung information that describes the user's interest and (2) the data collection reasoning to predict the content is close to the user's desires.
Initial data collection solutions require the user to disclose the information interest through the registration table, and the user may change this information [14]. The problem with this solution is that the user does not want to, or has difficulty in providing feedback about their search results as well as their concerns. Another direction, more popular, perform "learning", create user profiles through search history to classify, create groups of topics of interest to users with the aim of providing more information for the ranking. Based on the collected data, the authors build a model that describes and exploits relationships between users, queries, and Web pages, and serves search results matching the needs of user. In terms of characteristics, models may be limited to the exploitation of "two-way data" that exploits the user's interest in information topics, or "data in three directions" (three-way data) incorporates more information about the site.
In addition to the user-identified information solution, a number of solutions for exploiting user group information, created through the analysis of the alreadysearched content of the set User groups have the same characteristics (geographic location, occupation, interest) or have common search habits, such as Collaborative Filtering (CF). Web sites that meet a person's profile will be considered appropriate for others in the same group.
Due to the sparseness of the data sparsity, the latent semantic indexing algorithm is widely used as the primary technique for data modelling to optimize the layout as well as volume calculation [15].

Remarks
In the re-evaluation methods based on the rating of the original search engines, raw-score is the simplest method, which will compare directly the origin of the documents to the final result list. CombSUM is taking the total score of the document in the various search engines to determine the ranking in the final list. This score is standardized to avoid differences in the norms of each search engine, or to supplement the corresponding original server quality parameters in the Weighted CombSUM.
The second solutions group uses ranking order information in the original search list. This is the right solution to ensure search speed when the source of quality information equivalent.
The third solutions group uses the basic information (such as headings, excerpts, ...) of the original results in the scoring of documents. It compares the query with the title or footnote of the document, then applies the scoring formula based on ranking factors, title points, point lengths, lengths of title, and excerpts. In the news search system "News MetaSearcher" [16], in addition to the above factors, the time to update the document is also included in the rating formula.
The fourth solutions group performs the loading of the entire contents of the documents present in the original search result listings, then uses the indexing and scoring mechanism at the central server to perform the sorting, reordering the materials. It reviews the entire document to ensure a stable end result list, but takes a lot of time and bandwidth to load data from multiple servers.
The methods in the two first groups rely on the statistical information returned from the query (score, rank order) to perform calculations, so ensure a quick response to the final ranking result. However, some of the factors that make the quality of the endorsements are not good: Firstly, the search engines have large differences in data size, ranking algorithms that make the scoring formula based only on statistical information is not really relevant; Second, in reality the search server usually does not provide information about the document review point.
The third solutions group is usually chosen in practice because of its advantages in both speed and search quality compared to the two first groups. The final solution group has a stable ranking quality, but requires a lot of time for downloading the full content of the candidate materials as well as computational time for indexing and re-rating.
From here the requirement for a solution is guaranteed to make the most out of the basic information from the return lists, on the other hand requires the content of the documents in the final list to be consistent with the query and satisfactoriness on time and bandwidth costs.

Proposal solution 3.1 Idea
We propose a new solution to re-rank search results in using genetic programming.
Genetic Programming (GP) was first introduced by Angeline [17], based on genetic algorithms. In GP, each potential solution as a function is called an individual in the population set. GPs operate through the loop mechanism: at each generation, the dominant individual selectivity in the population is based on the content of the price; Perform hybrid, mutant, and spawn operations to create better individuals for later generations.
From randomness and irrelevance to the algorithmic principle of individual formation, in many cases genetic programming helps to overcome localized optimization errors. Although there is no assurance that the results identified by genetic programming are optimal, experimentation in different areas indicates that this result is generally better than the application of algorithms defined by the expert, in many cases, this result is close to the optimal solution [17].
An important element in the implementation of genetic programming is the definition of the individual, on the basis of which the content is determined, ensuring that the measurement accurately determines the quality of the solution. In addition, the complexity of the content, the number of individuals in the population, the rate of hybridization and mutation, the number of generations to be tested should be well defined to balance the ability to create a good solution, eliminate solutions that are not suitable for the calculation volume and time to solve the problem.
Previously, the practice of ranking methods was conducted independently, on different sets of data. This does not allow comparison of methods and hinders research. In 2007, Microsoft introduced the LETOR (LEARNING TO Rank) data set for the study of techniques in text search. In version 3.0 [18], the OHSUMED collection is edited from MEDLINE -a database of medical publications -for academic rankings. From the data of 106 queries, three files are created: the trainset contains 63 queries, the validation set contains 21 queries, and the testing set contains 22 queries. Each file contains records in the following format:  Table 1: Example attribute of the OHSUMED collection.
In the above formulas, qi is the query keyword i th in the query q, d is the document, c(qi, d) is the number of occurrences of qi in the document d; C is the total number of documents in the corpus, df(qi) is the number of documents containing the keyword qi. The BM25 and LMIR.JM scores are documented using the BM25 rating model and the Jelinek -Mercer smoothing language model [19].

Modelling application of genetic programming
The GP application solution for rating learning is as following model: -Input 1: Training data set D with recording records in the form of the OHSUMED collection; -Input 2: Parameters Ng is the number of generations, Np is the number of individuals per generation, Nc is the hybrid speed, Nm is the speed of the mutation.
-Output: The rank function F(q, d), which sets the value to a real number, corresponds to the relevance of the document d to the query q.
The training process consists of five steps as follows: -  [20], but retain the evaluation of non-linear functions. The function is binary tree, with inner vertices being operators, leaf vertices are constants or variables.
In the formulas, ai are the parameters, fi are the attribute values of the document, hi are the function.
In options 1, 2 and 3, to hybridize two individuals f1(q, d) and f2(q, d), a random list of parameters has the same index of functions to be exchanged. The mutation operation for the individual, f(q,d), is performed by swapping two random parameters of the function f(q, d).
Comparison of search and ranking solutions is usually based on the measures P@k, MAP, NDCG@k [20] that is used to determine the value of the content. Here, we test the fitness function corresponding to the MAP value.
In the first two options, Ng, Np, Nc, Nm are respectively 100, 100, 0.9, 0.1. For option 3, Ng, Np are defined as 200,400. In option 4, Ng, Np, Nc, Nm are respectively 1000, 100, 0.9 and 0.2. These values are determined by experiment. The Ng value, given in alternatives 3 and 4, is greater due to the complexity and diversity of individuals -the ranking function.

Experiment
The TF-Ranking experimental software, built on the basis of the PyEvolve library, was developed by Christian S. Perone 2 , which enables the development of a genetic algorithm for development in the Python language.
In the OHSUMED collection, the data is divided into five directories, each containing the train.txt, vali.txt and test.txt files for training, re-evaluation, and experimentation. According to each directory, the training and experiment steps are as follows: -The training module reads data from train.txt for best pbest selection, applying the scoring function to the text in test.txt.
For each option, the mean value for each of the P@k, MAP, NDCG@k scores of the five directories was taken as the scores for the experimental option. The implementation of training and experiment was done 5 V.T. Hung times, the average value for comparison and evaluation of results. Table 2, Table 3 and Table 4 compare MAP, P@k and NDCG@k (with k = 1, 2, 5, 10) of the proposed solution against the baseline method, published in website of the LETOR 3 assessment data set. Bold cells contain the highest values in the corresponding column.    Experimental results show that the TF-AF, TF-RF alternatives are good. MAP, NDCG @ k and P @ k values outperformed the corresponding Regression, RankSVM, and RankBoost methods, which were equivalent and slightly better than the ListNet and FRank methods. The TF-GF method was not very good: Despite the good results on the training set, the results on the experimental set were just average, sign of overfitting.

Method
One-time training for 5 directories with TF-AF, TF-TF, TF-FF, and TF-GF options takes 150 minutes, 70 minutes, 200 minutes and 10 hours respectively on a dual-3 http://research.microsoft.com/ CPU computer. Core 3.30 GHz, 4 GB RAM installed Windows 7. This result shows that the use of linear functions for ranking assures efficiency, both in terms of experimental quality and duration of training.

Conclusion
The paper introduces an overview on re-ranking. It evaluates the application of methods of mixing information retrieval results from multiple sources by recalculating the scores based on the basic information returned from the original search engine and proposing a re-ranking method. sequentially, progressively download the best documents to create the final result list.
The innovation of this proposal is applying the machine learning method in using genetic programming. We experimented proposal solution on the LETOR experimental data set to develop a new ranking system with the objective of evaluating the effectiveness of this learning methodology. Experimental results suggest that the proposed method is better than traditional methods in terms of both quality and time.
Our next research is to integrate this re-ranking tool in multi-language and cross-language search systems. The systems are intended to allow users to find documents in languages other than the language of the search keywords.