7-3 Alternative quicksort analysis

An alternative analysis of the running time of randomized quicksort focuses on the expected running time of each individual recursive call to \(\text{RANDOMIZED-QUICKSORT}\), rather than on the number of comparisons performed.

a. Argue that, given an array of size \(n\), the probability that any particular element is chosen as the pivot is \(1 / n\). Use this to define indicator random variables

\[X_i = I\\{i\text{th smallest element is chosen as the pivot}\\}.\]

What is \(\text E[X_i]\)?

b. Let \(T(n)\) be a random variable denoting the running time of quicksort on an array of size \(n\). Argue that

\[\text E[T(n)] = \text E\bigg[\sum_{q = 1}^n X_q(T(q - 1) + T(n - q) + \Theta(n))\bigg]. \tag{7.5}\]

c. Show that we can rewrite equation \(\text{(7.5)}\) as

\[\text E[T(n)] = \frac{2}{n}\sum_{q = 2}^{n - 1}\text E[T(q)] + \Theta(n). \tag{7.6}\]

d. Show that

\[\sum_{k = 2}^{n - 1}k\lg k \le \frac{1}{2}n^2\lg n - \frac{1}{8}n^2. \tag{7.7}\]

(\(\textit{Hint:}\) Split the summation into two parts, one for \(k = 2, 3, \ldots, \lceil n / 2 \rceil - 1\) and one for \(k = \lceil n / 2 \rceil, \ldots, n - 1\).)

e. Using the bound from equation \(\text{(7.7)}\), show that the recurrence in equation \(\text{(7.6)}\) has the solution \(\text E[T(n)] = \Theta(n\lg n)\). (\(\textit{Hint:}\) Show, by substitution, that \(\text E[T(n)] \le an\lg n\) for sufficiently large \(n\) and for some positive constant \(a\).)

a. Since the pivot is selected as a random element in the array, which has size \(n\), the probabilities of any particular element being selected are all equal, and add to one, so, are all \(\frac{1}{n}\). As such, \(\text E[X_i] = \Pr\\{i \text{ smallest is picked}\\} = \frac{1}{n}\).

b. We can apply linearity of expectation over all of the events \(X_i\). Suppose we have a particular \(X_i\) be true, then, we will have one of the sub arrays be length \(i - 1\), and the other be \(n - i\), and will of course still need linear time to run the partition procedure. This corresponds exactly to the summand in equation \(\text{(7.5)}\).

c.

\[ \begin{aligned} & \text E\Bigg[\sum_{q = 1}^n X_q(T(q - 1) + T(n - q) + \Theta(n)) \Bigg] \\\\ & = \sum_{q = 1}^n \text E[X_q(T(q - 1) + T(n - q) + \Theta(n))] \\\\ & = \sum_{q = 1}^n(T(q - 1) + T(n - q) + \Theta(n))/n \\\\ & = \Theta(n) + \frac{1}{n} \sum_{q = 1}^n(T(q - 1)+T(n - 1)) \\\\ & = \Theta(n) + \frac{1}{n} \Big(\sum_{q = 1}^n T(q - 1) + \sum_{q = 1}^n T (n - q) \Big) \\\\ & = \Theta(n) + \frac{1}{n} \Big(\sum_{q = 1}^n T(q - 1) + \sum_{q = 1}^n T (q - 1) \Big) \\\\ & = \Theta(n) + \frac{2}{n} \sum_{q = 1}^n T(q - 1) \\\\ & = \Theta(n) + \frac{2}{n} \sum_{q = 0}^{n - 1} T(q) \\\\ & = \Theta(n) + \frac{2}{n} \sum_{q = 2}^{n - 1} T(q). \end{aligned} \]

d. We will prove this inequality in a different way than suggested by the hint. If we let \(f(k) = k\lg k\) treated as a continuous function, then \(f'(k) = \lg k + 1\). Note now that the summation written out is the left hand approximation of the integral of \(f(k)\) from \(2\) to \(n\) with step size \(1\). By integration by parts, the anti-derivative of \(k\lg k\) is

\[\frac{1}{\lg 2}(\frac{k^2}{2}\ln k-\frac{k^2}{4}).\]

So, plugging in the bounds and subtracting, we get \(\frac{n^2\lg n}{2} - \frac{n^2}{4\ln 2} - 1\). Since \(f\) has a positive derivative over the entire interval that the integral is being evaluated over, the left hand rule provides a underapproximation of the integral, so, we have that

\[ \begin{aligned} \sum_{k = 2}^{n - 1} k\lg k & \le \frac{n^2\lg n}{2} - \frac{n^2}{4\ln 2} - 1 \\\\ & \le \frac{n^2\lg n}{2} - \frac{n^2}{8}, \end{aligned} \]

where the last inequality uses the fact that \(\ln 2 > 1 / 2\).

e. Assume by induction that \(T(q) \le q \lg(q) + \Theta(n)\). Combining \(\text{(7.6)}\) and \(\text{(7.7)}\), we have

\[ \begin{aligned} \text E[T(n)] & = \frac{2}{n} \sum_{q = 2}^{n - 1} \text E[T(q)] + \Theta(n) \\\\ & \le \frac{2}{n} \sum_{q = 2}^{n - 1}(q\lg q + \Theta(n)) + \Theta(n) \\\\ & \le \frac{2}{n} \sum_{q = 2}^{n - 1}q\lg q + \frac{2}{n}\Theta(n) + \Theta(n) \\\\ & \le \frac{2}{n}(\frac{1}{2}n^2\lg n - \frac{1}{8}n^2) + \Theta(n) \\\\ & = n\lg n -\frac{1}{4}n + \Theta(n) \\\\ & = n\lg n+\Theta(n). \end{aligned} \]