Talk:HITS algorithm
This article is rated Start-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||
|
"It is executed at query time, and not at indexing time, with the associated hit on performance that accompanies query-time processing"
The algorithm can also be carried out in a transient manner like Google. Is this a difference at all? (unsigned comment by User:59.95.4.160 2007-05-03T08:24:09)
Perhaps the article isn't clear. PageRank is a query-independent calculation over the entire crawl which can be performed in batch mode. The ranking of results for a particular query is a function of the page's PageRank (which is independent of the query) and various query-dependent measures such as TFIDF. HITS is performed after a set of pages has been selected using TFIDF or whatever, and works on the link structure within that set, calculating the "authority" and "hub" score relative to the query; something that is an authority for baseball is unlikely to be an authority for fettuccine. You could of course run HITS on the whole crawl, or PageRank on a subset, but that is not how they are designed to be used. --Macrakis 13:14, 3 May 2007 (UTC)
The article begins with "In the HITS algorithm, the first step is to retrieve the set of results to the search query. The computation is performed only on this result set, not across all Web pages." I think it should be clarified how the "set of results" is obtained, e.g. via TFIDF, or another metric. Beamishboy (talk) 22:26, 6 February 2010 (UTC)
That is not specified by the HITS algorithm. You can apply HITS to any set of pages. But that passage should be reworded. --macrakis (talk) 23:58, 6 February 2010 (UTC)
The article could elaborate on the purpose of HITS. It is not limited strictly to search engine. It can also be used for finding web comunities or populating categories in web directories. --137.73.122.137 (talk) 11:10, 16 April 2010 (UTC)
The most simple version of the HITS algorithm does not work well in all situations; 'topic drift' is one common issue. See http://www.nlp.org.cn/docs/20020903/114/A%20Survey%20On%20Web%20Information%20Retrieval%20Technologies.pdf (p. 16) for limitations. --137.73.122.137 (talk) 11:10, 16 April 2010 (UTC)
The pseudocode of the convergent HITS algorithm is wrong. 129.215.58.33 (talk) 22:35, 15 November 2011 (UTC)
"Normalization
The final hub-authority scores of nodes are determined after infinite repetitions of the algorithm. As directly and iteratively applying the Hub Update Rule and Authority Update Rule leads to diverging values, it is necessary to normalize the matrix after every iteration. Thus the values obtained from this process will eventually converge.[4]"
This is wrong, normalizing by the sum of the squares does NOT give converging values, a simple 3-node graph shows this, {1}->{2}, {2}->{3}, {3}->{2}, {1}->{3} gives a repeating sequence of hub/auth scores. The proper normalization is by the square root of the sum of the squares, so that the sum of the squares of the new hub/auth scores is 1. Also the reference link points to 404 now, the lectures have been removed, so the initial claim cannot be verified in any way. — Preceding unsigned comment added by 163.1.150.29 (talk) 03:01, 2 March 2012 (UTC)
I think the pseudocode is wrong. It defines a function called HubsAndAuthorities but the function is never called. I am thinking the right fix is just to delete the function definition ( line 5 ) and let the code run. — Preceding unsigned comment added by Rsherry8 (talk • contribs) 17:41, 6 September 2019 (UTC)