27-4 Multithreading reductions and prefix computations
A \(\otimes\)-reduction of an array \(x[1 \ldots n]\), where \(\otimes\) is an associative operator, is the value
\[y = x[1] \otimes x[2] \otimes \cdots \otimes x[n].\]The following procedure computes the \(\otimes\)-reduction of a subarray \(x[i \ldots j]\) serially.
C++ 1 2 3 4 5REDUCE(x, i, j) y = x[i] for k = i + 1 to j y = y ⊗ x[k] return y
a. Use nested parallelism to implement a multithreaded algorithm \(\text{P-REDUCE}\), which performs the same function with \(\Theta(n)\) work and \(\Theta(\lg n)\) span. Analyze your algorithm.
A related problem is that of computing a \(\otimes\)-prefix computation, sometimes called a \(\otimes\)-scan, on an array \(x[1 \ldots n]\), where \(\otimes\) is once again an associative operator. The \(\otimes\)-scan produces the array \(y[1 \ldots n]\) given by
\[ \begin{aligned} y[1] & = x[1], \\\\ y[2] & = x[1] \otimes x[2], \\\\ y[3] & = x[1] \otimes x[2] \otimes x[3], \\\\ & \vdots \\\\ y[n] & = x[1] \otimes x[2] \otimes x[3] \otimes \cdots \otimes x[n], \end{aligned} \]that is, all prefixes of the array \(x\) "summed" using \(\otimes\) operator. The following serial procedure \(\text{SCAN}\) performs a \(\otimes\)-prefix computation:
C++ 1 2 3 4 5 6 7SCAN(x) n = x.length let y[1..n] be a new array y[1] = x[1] for i = 2 to n y[i] = y[i - 1] ⊗ x[i] return y
Unfortunately, multithreading \(\text{SCAN}\) is not straightforward. For example, changing the for loop to a parallel for loop would create races, since each iteration of the loop body depends on the previous iteration. The following procedure \(\text{P-SCAN-1}\) performs the \(\otimes\)-prefix computation in parallel, albeit inefficiently.
C++ 1 2 3 4 5P-SCAN-1(x) n = x.length let y[1..n] be a new array P-SCAN-1-AUX(x, y, 1, n) return y
C++ 1 2 3P-SCAN-1-AUX(x, y, i, j) parallel for l = i to j y[l] = P-REDUCE(x, 1, l)
b. Analyze the work, span, and parallelism of \(\text{P-SCAN-1}\).
By using nested parallelism, we can obtain a more efficient \(\otimes\)-prefix computation:
C++ 1 2 3 4 5P-SCAN-2(x) n = x.length let y[1..n] be a new array P-SCAN-2-AUX(x, y, 1, n) return y
C++ 1 2 3 4 5 6 7 8 9P-SCAN-2-AUX(x, y, i, j) if i == j y[i] = x[i] else k = floor((i + j) / 2) spawn P-SCAN-2-AUX(x, y, i, k) P-SCAN-2-AUX(x, y, k + 1, j) sync parallel for l = k + 1 to j y[l] = y[k] ⊗ y[l]
c. Argue that \(\text{P-SCAN-2}\) is correct, and analyze its work, span, and parallelism.
We can improve on both \(\text{P-SCAN-1}\) and \(\text{P-SCAN-2}\) by performing the \(\otimes\)-prefix computation in two distinct passes over the data. On the first pass, we gather the terms for various contiguous subarrays of \(x\) into a temporary array \(t\), and on the second pass we use the terms in \(t\) to compute the final result \(y\). The following pseudocode implements this strategy, but certain expressions have been omitted:
C++ 1 2 3 4 5 6 7 8P-SCAN-3(x) n = x.length let y[1..n] and t[1..n] be new arrays y[1] = x[1] if n > 1 P-SCAN-UP(x, t, 2, n) P-SCAN-DOWN(x[1], x, t, y, 2, n) return y
C++ 1 2 3 4 5 6 7 8 9P-SCAN-UP(x, t, i, j) if i == j return x[i] else k = floor((i + j) / 2) t[k] = spawn P-SCAN-UP(x, t, i, k) right = P-SCAN-UP(x, t, k + 1, j) sync return ____ // fill in the blank
C++ 1 2 3 4 5 6 7 8P-SCAN-DOWN(v, x, t, y, i, j) if i == j y[i] = v ⊗ x[i] else k = floor((i + j) / 2) spawn P-SCAN-DOWN(____, x, t, y, i, k) // fill in the blank P-SCAN-DOWN(____, x, t, y, k + 1, j) // fill in the blank sync
d. Fill in the three missing expressions in line 8 of \(\text{P-SCAN-UP}\) and lines 5 and 6 of \(\text{P-SCAN-DOWN}\). Argue that with expressions you supplied, \(\text{P-SCAN-3}\) is correct. (\(\textit{Hint:}\) Prove that the value \(v\) passed to \(\text{P-SCAN-DOWN}(v, x, t, y, i, j)\) satisfies \(v = x[1] \otimes x[2] \otimes \cdots \otimes x[i - 1]\).)
e. Analyze the work, span, and parallelism of \(\text{P-SCAN-3}\).
(Removed)
本页面的全部内容在 小熊老师 - 莆田青少年编程俱乐部 0594codes.cn 协议之条款下提供,附加条款亦可能应用