Jump to content

User:RMcPhillip/tables

From Wikipedia, the free encyclopedia

Delta2 - The “Good Suffix” Table

[edit]

This table contains an entry for each character in the pattern. The entry for pattern[j] specifies how far the current string position should shift to the right when pattern[j-1] does not match the string but the suffix at pattern[j .. patlen] does match.

Its construction is less obvious than the delta1 table and might best be explained after first introducing several terms. We use this table when some maximal suffix of the pattern matches some substring found in the string and when that suffix recurs earlier (to the left) in the pattern. The substring is delimited by the character before the suffix, the pre-suffix character, which does not match the character before the substring. (If these two characters match then the suffix is not maximal.)

In example 1d, suffix “AT” in “AT-THAT” matches substring “AT” in “...-AT-T...”. The substring length is 2 because the pre-suffix character H does not match the dash that precedes the substring. The suffix recurs at the left of the pattern, “AT-THAT”.

To populate the delta2 table, we need to determine for each position j in the pattern where the suffix starting at j+1 recurs. We are interested in only the rightmost recurrence. (For pattern “AT-THAT”, the suffix T occurs at the far right and the rightmost recurrence of T is “AT-THAT”.) In addition Boyer-Moore deems a recurrence to be “plausible” only if either (a) its preceding character differs from the pre-suffix character or (b) there is no preceding character because the recurrence is at the beginning of the pattern. (For example, in pattern “AT-FAT”, the first occurrence of T is not a plausible recurrence of suffix T because both Ts are preceded by A. If a string matches on T but mismatches on A then shifting the pattern right by 4 would not be optimal because we would immediately mismatch on A again.) So, to populate the delta2 table we must determine the rightmost plausible recurrence (RPR) of each character in the pattern.

At first glance it might appear that there is no RPR for many suffixes; what is the RPR for “HAT” in “AT-THAT”? Boyer-Moore handles this by saying any suffix character matches pattern[j] where j < 0. For example:

Example 2a
j -2 -1 0 1 2 3 4 5 6
suffix A T - T H A T
RPR * A T - T H A T

RPR(4) is -1 because the suffix at pattern[4] recurs at pattern[-1].

Note: pattern[-1] matches pattern[4] because -1 < 0,pattern[0] = pattern[5] = A, pattern[1] = pattern[6] = T, and the recurrence is plausible because j < 0.

Boyer-Moore shows that delta2[j] is simply patlen - rpr[j].

The following examples, derived from Boyer-Moore, illustrate this:

Example 2b
j 0 1 2 3 4 5 6
pattern A T - T H A T
rpr(j) -4 -3 -2 -1 0 3 6
delta2(j) 11 10 9 8 7 4 1
Example 2c
j 0 1 2 3 4 5 6 7 8
pattern A B C X X X A B C
rpr(j) -5 -4 -3 -2 -1 0 -2 -1 8
delta2(j) 14 13 12 11 10 9 11 10 1
Example 2d
j 0 1 2 3 4 5 6 7 8
pattern A B Y X C D E Y X
rpr(j) -8 -7 -6 -5 -4 -3 2 -1 8
delta2(j) 17 16 15 14 13 12 7 10 1