Jump to content

Talk:Rolling hash

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Rabin-Karp vs cyclic polynomial

[edit]

I'm not sure I understand the big difference here. Isn't cyclic polynomial just Rabin-Karp with a=2 and xor instead of plus? Perhaps this can be made clearer. —Preceding unsigned comment added by Ketil (talkcontribs) 12:32, 18 May 2011 (UTC)[reply]

Yes, the cyclic polynomial rolling hash can be seen as a small variation of Rabin-Karp with a=2 and xor instead of plus. Also, the moving average rolling hash can be seen as the special case of Rabin-Karp with a=1. Also, a rectangle can be seen as a special case of a quadrilateral.
In practice, the big difference is that code specifically written for these special cases usually run much faster than a general Rabin-Karp hash.
In practice, these special cases are extremely commonly used.
And so, even though the characteristics of these special cases may be obvious to you from the general description, WP:OBVIOUS and WP:TECHNICAL recommend that we say a few words about common special cases. --DavidCary (talk) 01:37, 3 June 2015 (UTC)[reply]
I don't understand why you agree with this "Isn't cyclic polynomial just Rabin-Karp with a=2 and xor instead of plus?". The cyclic polynomial uses a hash h, effectively a small lookup table of random values, which in no way corresponds to a=2. 79.75.102.4 (talk) 14:22, 10 September 2020 (UTC)[reply]

Another difference is that the "cyclic polynomial" only works when the number of bits in the lookup table is larger than k (the length of substrings we are interested in.) Thus if we are doing string matching with strings of length 1000 we need to store 1000 bit hash values. Thomasda (talk) 18:34, 13 April 2021 (UTC)[reply]

Content-based slicing using moving average

[edit]

Why is A(n) computed? It doesn't seem to be used. Only the moving sum S(n) is used. Ligneus (talk) 22:34, 29 June 2015 (UTC)[reply]

Implementations

[edit]

Should this page link to implementations at all? The C++ implementation linked in this page has an implementation of Rabin Karp which is not the actual Rabin Fingerprint, but a polynomial that I don't know how to name exactly. I tried to emphasize the difference by adding a section. The point is, I believe it is wrong. I am the author of https://github.com/chmduquesne/rollinghash and I considered linking it, but for the same reason (I could have made a mistake) I did not link it in this page. — Preceding unsigned comment added by Chmduquesne (talkcontribs) 23:33, 3 February 2018 (UTC)[reply]

Why in the cyclic polynomial are circular shifts used?

[edit]

Why bother? Just use the h() hash directly and xor together. What does adding in shifts gain you? 79.75.102.4 (talk) 14:24, 10 September 2020 (UTC)[reply]

Questioner here again: on reflection, A xor A = 0 which makes it much less useful, so a shift adds reversible extra info which prevents a character cancelling itself out. 85.211.193.117 (talk) 11:29, 12 September 2020 (UTC)[reply]

Content-Defined Chunking

[edit]

@Xyb: This article has a short section about Content-Defined Chunking. Should the section be split into a separate article? Jarble (talk) 23:44, 20 January 2024 (UTC)[reply]