9-2 Weighted median

For \(n\) distinct elements \(x_1, x_2, \ldots, x_n\) with positive weights \(w_1, w_2, \ldots, w_n\) such that \(\sum_{i = 1}^n w_i = 1\), the weighted (lower) median is the element \(x_k\) satisfying

\[\sum_{x_i < x_k} w_i < \frac{1}{2}\]

and

\[\sum_{x_i > x_k} w_i \le \frac{1}{2}.\]

For example, if the elements are \(0.1, 0.35, 0.05, 0.1, 0.15, 0.05, 0.2\) and each element equals its weight (that is, \(w_i = x_i\) for \(i = 1, 2, \ldots, 7\)), then the median is \(0.1\), but the weighted median is \(0.2\).

a. Argue that the median of \(x_1, x_2, \ldots, x_n\) is the weighted median of the \(x_i\) with weights \(w_i = 1 / n\) for \(i = 1, 2, \ldots, n\).

b. Show how to compute the weighted median of \(n\) elements in \(O(n\lg n)\) worstcase time using sorting.

c. Show how to compute the weighted median in \(\Theta(n)\) worst-case time using a linear-time median algorithm such as \(\text{SELECT}\) from Section 9.3.

The post-office location problem is defined as follows. We are given \(n\) points \(p_1, p_2, \ldots, p_n\) with associated weights \(w_1, w_2, \ldots, w_n\). We wish to find a point \(p\) (not necessarily one of the input points) that minimizes the sum \(\sum_{i = 1}^n w_i d(p, p_i)\), where \(d(a, b)\) is the distance between points \(a\) and \(b\).

d. Argue that the weighted median is a best solution for the \(1\)-dimensional postoffice location problem, in which points are simply real numbers and the distance between points \(a\) and \(b\) is \(d(a, b) = |a - b|\).

e. Find the best solution for the \(2\)-dimensional post-office location problem, in which the points are \((x,y)\) coordinate pairs and the distance between points \(a = (x_1, y_1)\) and \(b = (x_2, y_2)\) is the Manhattan distance given by \(d(a, b) = |x_1 - x_2| + |y_1 - y_2|\).

a. Let \(m_k\) be the number of \(x_i\) smaller than \(x_k\). When weights of \(1 / n\) are assigned to each \(x_i\), we have \(\sum_{x_i < x_k} w_i = m_k / n\) and \(\sum_{x_i > x_k} w_i = (n - m_k - 1) / 2\). The only value of \(m_k\) which makes these sums \(< 1 / 2\) and \(\le 1 / 2\) respectively is when \(\lceil n / 2 \rceil - 1\), and this value \(x\) must be the median since it has equal numbers of \(x_i's\) which are larger and smaller than it.

b. First use mergesort to sort the \(x_i\)'s by value in \(O(n\lg n)\) time. Let \(S_i\) be the sum of the weights of the first \(i\) elements of this sorted array and note that it is \(O(1)\) to update \(S_i\). Compute \(S_1, S_2, \dots\) until you reach \(k\) such that \(S_{k − 1} < 1 / 2\) and \(S_k \ge 1 / 2\). The weighted median is \(x_k\).

c. We modify \(\text{SELECT}\) to do this in linear time. Let \(x\) be the median of medians. Compute \(\sum_{x_i < x} w_i\) and \(\sum_{x_i > x} w_i\) and check if either of these is larger than \(1 / 2\). If not, stop. If so, recurse on the collection of smaller or larger elements known to contain the weighted median. This doesn't change the runtime, so it is \(\Theta(n)\).

d. Let \(p\) be the minimizer, and suppose that \(p\) is not the weighted median. Let \(\epsilon\) be small enough such that \(\epsilon < \min_i(|p − p_i|)\), where we don't include \(k\) if \(p = p_k\). If \(p_m\) is the weighted median and \(p < p_m\), choose \(\epsilon > 0\). Otherwise choose \(\epsilon < 0\). Thus, we have

\[\sum_{i = 1}^n w_id(p + \epsilon, p_i) = \sum_{i = 1}^n w_id(p, p_i) + \epsilon\left(\sum_{p_i < p} w_i - \sum_{p_i > p} w_i \right) < \sum_{i = 1}^n w_id(p, p_i),\]

the difference in sums will take the opposite sign of epsilon.

e. Observe that

\[\sum_{i = 1}^n w_id(p, p_i) = \sum_{i = 1}^n w_i |p_x - (p_i)_x| + \sum\_{i = 1}^n w_i|p_y - (p_i)_y|.\]

It will suffice to minimize each sum separately, which we can do since we choose \(p_x\) and \(p_y\) individually. By part (e), we simply take \(p = (p_x, p_y)\) to be such that \(p_x\) is the weighted median of the \(x\)-coordinates of the \(p_i\)'s and py is the weighted medain of the \(y\)-coordiantes of the \(p_i\)'s.