Algorithm - Karp Rabin (Rolling Hash)

This article explores the Karp Rabin algorithm implemented in the Rolling Hash data structure. The article aims to show how rolling hashes can speed up inefficient ‘String matching’ problem.

Karp Rabin

The Rabin Karp algorithm is an example of a Rolling Hash algorithm that can be used in string matching. The ‘Rolling’ comes from the adding and subtracting of a character in the rolling hash window. Each character in string is evaluated only once.

The Rolling hash is an abstract data type that supports:

  1. hash function
  2. append(value)
  3. skip(value)
  4. Internal list.

The Strings to be matched are stored as Characters in a list. Each character is treated as integers with their values depending on the encoding used. For example, a string with an ASCII encoding represents each item in the list as a number in its base (ASCII has base 256). This means ‘A’ as 65, B as 66, etc…

Implementation

  1. Hash function
    The hash function used in Karp Rabin’s Rolling hash is:
hash(k)=C1base(k1)+C2base(k2)+...+Cbase1hash(k) = C_1 * base ^{(k-1)} + C_2 * base ^ {(k-2)} + ... + C * base ^ 1

where …

  • C is the- base value for the character
  • k is the length of the target substring.
  • base is usually the prime number > 256 since ASCII = 256

The efficiency in the Rolling hash comes from not needing to re-compute the hash for the entire length of string k. This efficiency is possible because the rehashing involves removing the most significant digit and add the new least significant digit for the hash value without the need to recompute the hash for the full length of string k.

Example 1

An array of [1,2,3,4,5,6], and using a hash window of length k of 3.

Rolling Hash for 3 items Note
hash(123) 123
remove(1) Remove most significant digit. 23
add(4) Add least significant digit. (1231 * 100) * 10 + 4 = 234

Note: 100 (most significant base) in (1 * 100) comes from 2 ^ 10 (base 10 numbers).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
function rollingHash(data = []) {
let rollingWindow = [];
const hash = (input = []) => input.toString();
const rolling_add = (value) => rollingWindow.push(value);
const rolling_remove = () => rollingWindow.splice(0, 1);
const matches = (match = []) => {
const found = [];
const targetHash = hash(match);
for (let x = 0; x < data.length; x++) {
if (rollingWindow.length === match.length) {
rolling_remove();
}

rolling_add(data[x]);
if (targetHash === hash(rollingWindow)) {
// compensate for the sliding window to find first element of rolling window
found.push(x - (match.length - 1));
}
}
return found;
}
return { matches }
};
(() => {
const rh = rollingHash([6, 7, 3, 4, 5, 6, 7, 8, 9, 6, 7, 0]);
//console.log(rh.matches([6, 7, 8])); //5
//console.log(rh.matches([6, 7, 0])); // 9
console.log(rh.matches([6, 7])); // 0, 5, 9
})();

Example 2

String value of abcdef matching bc. Therefore k=2

Rolling Hash for 2 chars
hash(ab) (97 * 257^2-1) + (98 * 257^1) => 24,929 + 25,186 = 50,115
remove(a) (97 * 257^2-1) = 24,929
add(c) 50,115 - (97 * 257) + (99 * 257)
50,115 - 24929 + (99 * 257)
25,186 + 25,443 = 50,629

The below shows the hash calculated for bc, this hash value (50,629) matches the hash generated using rolling hashing.

String
bc (98 * 257^2-1) + (99 * 257^1) = 25,186 + 25,443 = 50,629
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
// Note the base is very important as it affects the correctness of the hash
// base is 256 since ASCII has 256 characters
function karpRabin(base = 256, randomPrime = 255) {
const hash = (value,) => {
let rollingHash = 0;
let significantDigit = value.length - 1 === 0 ? 1 : value.length - 1;;

for (i = 0; i < value.length; i++) {
const asciiCode = value.charCodeAt(i);
rollingHash += ((asciiCode * Math.pow(base, significantDigit)) % randomPrime);
}

return rollingHash;
}
const rolling_remove = (rollingHash, character, significantDigit) => {
const asciiCode = character.charCodeAt(0);
return rollingHash - ((asciiCode * Math.pow(base, significantDigit)) % randomPrime);
}
const rolling_add = (rollingHash, character) => {
const asciiCode = character.charCodeAt(0);
return rollingHash + ((asciiCode * base) % randomPrime);
}
const matches = ({ pattern = '', data = '' }) => {
const found = [];

// Initialize rolling window
const rollingWindow = data.substring(0, pattern.length);
let targetHash = hash(pattern);
let rollingHash = hash(rollingWindow);

// Initial check
if (targetHash === rollingHash) {
if (verify(pattern, rollingWindow)) found.push(0);
}

for (let x = pattern.length; x < data.length; x++) {
const rollingWindowLength = pattern.length;
const rollingWindowStartIndex = x - (rollingWindowLength - 1);
// ensure significant digit not zero
const significantDigit = rollingWindowLength - 1 === 0 ? 1 : rollingWindowLength - 1;

rollingHash = rolling_remove(rollingHash, data[x - rollingWindowLength], significantDigit);
rollingHash = rolling_add(rollingHash, data[x]);

if (targetHash === rollingHash) {
// hash match, check for false positive
let stringToCheck = data.substr(rollingWindowStartIndex, rollingWindowLength);
if (verify(pattern, stringToCheck)) found.push(rollingWindowStartIndex);
}
}

return found;
}
const verify = (stringA, stringB) => {
let isExact = true;
for (let a = 0; a < stringA.length; a++) {
if (stringA[a] !== stringB[a]) {
isExact = false;
}
}
return isExact;
}
return { matches }
};
(() => {
const krabin = karpRabin();
//console.log(krabin.matches({ data: 'aaaxxxaaax', pattern: 'xxxaa' })); // 3
//console.log(krabin.matches({ data: 'aaaxxxaaax', pattern: 'ax' })); // 2,8
//console.log(krabin.matches({ data: 'aaaxxxaaax', pattern: 'x' })); // 3,4,59
})();

Time complexity

Assuming the text is length n and the length of the word is m, the average and best-case running time of Rabin-Karp is O(n + m).
The worst-case time is O(nm), this occurs when all characters of pattern and text are same as the hash values causing lots of hash collisions. (essentially doing a character by character comparison)

Source - Karp Rabin
Source - Karp Rabin Q&A
Source - Rolling hash