# Algorithm - Karp Rabin (Rolling Hash)

This article explores the Karp Rabin algorithm implemented in the Rolling Hash data structure. The article aims to show how rolling hashes can speed up inefficient ‘String matching’ problem.

# Karp Rabin

The Rabin Karp algorithm is an example of a Rolling Hash algorithm that can be used in string matching. The ‘Rolling’ comes from the adding and subtracting of a character in the rolling hash window. Each character in string is evaluated only once.

The Rolling hash is an abstract data type that supports:

1. hash function
2. append(value)
3. skip(value)
4. Internal list.

The Strings to be matched are stored as Characters in a list. Each character is treated as integers with their values depending on the encoding used. For example, a string with an ASCII encoding represents each item in the list as a number in its base (ASCII has base 256). This means ‘A’ as 65, B as 66, etc…

## Implementation

1. Hash function
The hash function used in Karp Rabin’s Rolling hash is:
$hash(k) = C_1 * base ^{(k-1)} + C_2 * base ^ {(k-2)} + ... + C * base ^ 1$

where …

• C is the- base value for the character
• k is the length of the target substring.
• base is usually the prime number > 256 since ASCII = 256

The efficiency in the Rolling hash comes from not needing to re-compute the hash for the entire length of string k. This efficiency is possible because the rehashing involves removing the most significant digit and add the new least significant digit for the hash value without the need to recompute the hash for the full length of string k.

### Example 1

An array of [1,2,3,4,5,6], and using a hash window of length k of 3.

Rolling Hash for 3 items Note
hash(123) 123
remove(1) Remove most significant digit. 23
add(4) Add least significant digit. (1231 * 100) * 10 + 4 = 234

Note: 100 (most significant base) in (1 * 100) comes from 2 ^ 10 (base 10 numbers).

### Example 2

String value of abcdef matching bc. Therefore k=2

Rolling Hash for 2 chars
hash(ab) (97 * 257^2-1) + (98 * 257^1) => 24,929 + 25,186 = 50,115
remove(a) (97 * 257^2-1) = 24,929
add(c) 50,115 - (97 * 257) + (99 * 257)
50,115 - 24929 + (99 * 257)
25,186 + 25,443 = 50,629

The below shows the hash calculated for bc, this hash value (50,629) matches the hash generated using rolling hashing.

String
bc (98 * 257^2-1) + (99 * 257^1) = 25,186 + 25,443 = 50,629

## Time complexity

Assuming the text is length n and the length of the word is m, the average and best-case running time of Rabin-Karp is O(n + m).
The worst-case time is O(nm), this occurs when all characters of pattern and text are same as the hash values causing lots of hash collisions. (essentially doing a character by character comparison)