5-2 Searching an unsorted array

The problem examines three algorithms for searching for a value \(x\) in an unsorted array \(A\) consisting for \(n\) elements.

Consider the following randomized strategy: pick a random index \(i\) into \(A\). If \(A[i] = x\), then we terminate; otherwise, we continue the search by picking a new random index into \(A\). We continue picking random indices into \(A\) until we find an index \(j\) such that \(A[j] = x\) or until we have checked every element of \(A\). Note that we pick from the whole set of indices each time, so that we may examine a given element more than once.

a. Write pseudocode for a procedure \(\text{RANDOM-SEARCH}\) to implement the strategy above. Be sure that your algorithm terminates when all indices into \(A\) have been picked.

b. Suppose that there is exactly one index \(i\) such that \(A[i] = x\). What is the expected number of indices into \(A\) that we must pick before we find \(x\) and \(\text{RANDOM-SEARCH}\) terminates?

c. Generalizing your solution to part (b), suppose that there are \(k \ge 1\) indices \(i\) such that \(A[i] = x\). What is the expected number of indices into \(A\) that we must pick before we find \(x\) and \(\text{RANDOM-SEARCH}\) terminates? Your answer should be a function of \(n\) and \(k\).

d. Suppose that there are no indices \(i\) such that \(A[i] = x\). What is the expected number of indices into \(A\) that we must pick before we have checked all elements of \(A\) and \(\text{RANDOM-SEARCH}\) terminates?

Now consider a deterministic linear search algorithm, which we refer to as \(\text{DETERMINISTIC-SEARCH}\). Specifically, the algorithm searches \(A\) for \(x\) in order, considering \(A[1], A[2], A[3], \ldots, A[n]\) until either it finds \(A[i] = x\) or it reaches the end of the array. Assume that possible permutations of the input array are equally likely.

e. Suppose that there is exactly one index \(i\) such that \(A[i] = x\). What is the average-case running time of \(\text{DETERMINISTIC-SEARCH}\)? What is the worst-case running time of \(\text{DETERMINISTIC-SEARCH}\)?

f. Generalizing your solution to part (e), suppose that there are \(k \ge 1\) indices \(i\) such that \(A[i] = x\). What is the average-case running time of \(\text{DETERMINISTIC-SEARCH}\)? What is the worst-case running time of \(\text{DETERMINISTIC-SEARCH}\)? Your answer should be a function of \(n\) and \(k\).

g. Suppose that there are no indices \(i\) such that \(A[i] = x\). What is the average-case running time of \(\text{DETERMINISTIC-SEARCH}\)? What is the worst-case running time of \(\text{DETERMINISTIC-SEARCH}\)?

Finally, consider a randomized algorithm \(\text{SCRAMBLE-SEARCH}\) that works by first randomly permuting the input array and then running the deterministic linear search given above on the resulting permuting array.

h. Letting \(k\) be the number of indices \(i\) such that \(A[i] = x\), give the worst-case and expected running time of \(\text{SCRAMBLE-SEARCH}\) for the cases in which \(k = 0\) and \(k = 1\). Generalizing your solution to handle the case in which \(k \ge 1\).

i. Which of the three searching algorithms would you use? Explain your answer.

a.

C++
1
2
3
4
5
6
7
8
9
RANDOM-SEARCH(x, A, n)
    v = Ø
    while |Ø| != n
        i = RANDOM(1, n)
        if A[i] = x
            return i
        else
            v = v  i
    return NIL

\(v\) can be implemented in multiple ways: a hash table, a tree or a bitmap. The last one would probabily perform best and consume the least space.

b. \(\text{RANDOM-SEARCH}\) is well-modelled by Bernoulli trials. The expected number of picks is \(n\).

c. In similar fashion, the expected number of picks is \(n / k\).

d. This is modelled by the balls and bins problem, explored in section 5.4.2. The answer is \(n(\ln n + O(1))\).

e. The worst-case running time is \(n\). The average-case is \((n + 1) / 2\) (obviously).

f. The worst-case running time is \(n - k + 1\). The average-case running time is \((n + 1) / (k + 1)\). Let \(X_i\) be an indicator random variable that the \(i\)th element is a match. \(\Pr\\{X_i\\} = 1 / (k + 1)\). Let \(Y\) be an indicator random variable that we have found a match after the first \(n - k + 1\) elements (\(\Pr\\{Y\\} = 1\)). Thus,

\[ \begin{aligned} \text E[X] & = \text E[X_1 + X_2 + \ldots + X_{n - k} + Y] \\\\ & = 1 + \sum_{i = 1}^{n - k}\text E[X_i] = 1 + \frac{n - k}{k + 1} \\\\ & = \frac{n + 1}{k + 1}. \end{aligned} \]

g. Both the worst-case and average case is \(n\).

h. It's the same as \(\text{DETERMINISTIC-SEARCH}\), only we replace "average-case" with "expected".

i. Definitelly \(\text{DETERMINISTIC-SEARCH}\). \(\text{SCRAMBLE-SEARCH}\) gives better expected results, but for the cost of randomly permuting the array, which is a linear operation. In the same time we could have scanned the full array and reported a result.