32-1 String matching based on repetition factors

Let \(y^i\) denote the concatenation of string \(y\) with itself \(i\) times. For example, \((\text{ab})^3 = \text{ababab}\). We say that a string \(x \in \Sigma^\*\) has repetition factor \(r\) if \(x = y ^ r\) for some string \(y \in \Sigma^\*\) and some \(r > 0\). Let \(\rho(x)\) denote the largest \(r\) such that \(x\) has repetition factor \(r\).

a. Give an efficient algorithm that takes as input a pattern \(P[1 \ldots m]\) and computes the value \(\rho(P_i)\) for \(i = 1, 2, \ldots, m\). What is the running time of your algorithm?

b. For any pattern \(P[1 \ldots m]\), let \(\rho^\*(P)\) be defined as \(\max_{1 \le i \le m} \rho(P_i)\). Prove that if the pattern \(P\) is chosen randomly from the set of all binary strings of length \(m\), then the expected value of \(\rho^\*(P)\) is \(O(1)\).

c. Argue that the following string-matching algorithm correctly finds all occurrences of pattern \(P\) in a text \(T[1 \ldots n]\) in time \(O(\rho^\*(P)n + m)\):

C++
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
REPETITION_MATCHER(P, T)
    m = P.length
    n = T.length
    k = 1 + ρ*(P)
    q = 0
    s = 0
    while s  n - m
        if T[s + q + 1] == P[q + 1]
            q = q + 1
            if q == m
                print "Pattern occurs with shift" s
        if q == m or T[s + q + 1] != P[q + 1]
            s = s + max(1, ceil(q / k))
            q = 0

This algorithm is due to Galil and Seiferas. By extending these ideas greatly, they obtained a linear-time string-matching algorithm that uses only \(O(1)\) storage beyond what is required for \(P\) and \(T\).

a. Compute \(\pi\), let \(l = m - \pi[m]\), if \(m ~\text{mod}~ l = 0\) and for all \(p = m - i \cdot l > 0\), \(p - \pi[p] = l\), then \(\rho(P_i) = m / l\), otherwise \(\rho(P_i) = 1\). The running time is \(\Theta(n)\).

b.

\[ \begin{aligned} P(\rho^\*(P) \ge 2) & = \frac{1}{2} + \frac{1}{8} + \frac{1}{32} + \cdots \approx \frac{2}{3} \\\\ P(\rho^\*(P) \ge 3) & = \frac{1}{4} + \frac{1}{32} + \frac{1}{256} + \cdots \approx \frac{2}{7} \\\\ P(\rho^\*(P) \ge 4) & = \frac{1}{8} + \frac{1}{128} + \frac{1}{2048} + \cdots \approx \frac{2}{15} \\\\ P(\rho^\*(P) = 1) & = \frac{1}{3} \\\\ P(\rho^\*(P) = 2) & = \frac{8}{21} \\\\ P(\rho^\*(P) = 3) & = \frac{16}{105} \\\\ \text E[\rho^\*(P)] & = 1 \cdot \frac{1}{3} + 2 \cdot \frac{8}{21} + 3 \cdot \frac{16}{105} + \ldots \approx 2.21 \end{aligned} \]

c.

(Omit!)