Sunday, September 23, 2007

User relevant feedback in search engine

User relevant feedback
User relevant feedback is performed with the help of the user by providing the list of relevant and irrelevant document name. With the document names input and the actual query, we will apply incremental method using Standard Rochio. The query vector will be shift toward the document known to be relevant and away from document known to be irrelevant. As the query is shifted, the query itself is expanded with new words added in.


>> Standard Rochio

For single word query, the query vector with the full size vocabulary needs to be formulated using the query weighting and normalized. Next, the set of all relevant and irrelevant documents are formulated into document vectors using index TFIDF weighting. Each relevant document vectors are normalized and average is obtained. The same is applied to the irrelevant document vectors. Using alpha, beta and gamma set at 1, 1 and 0.1 respectively in Standard Rochio and normalized, we can achieve the shifting of query vector. Each word in the shifted query vector has a newly assigned weight.




>> Query Weighting




>> TDIDF Weighting

Query expansion is done when the top ten words with the highest weight is appended into the initial query. The top ten words are listed with their weights and these words are appended to the initial query terms to refine the new search. We find it inappropriate to consider the initial query terms as one of the top ten words, hence the expanded word will exclude the initial query terms.

The same technique is applied to bi word query expansion. If the search query contains only single words, then we will only expand the query with top ten single words. If the search query contains only bi words, then we will only expand the query with top ten bi words. Lastly, if the search query contains both single and bi word, we will expand the query with both the top ten single and bi words.

No comments: