27-1 Implementing parallel loops using nested parallelism

Consider the following multithreaded algorithm for performing pairwise addition on \(n\)-element arrays \(A[1..n]\) and \(B[1..n]\), storing the sums in \(C[1..n]\):

C++
1
2
3
SUM-ARRAYS(A, B, C)
    parallel for i = 1 to A.length
        C[i] = A[i] + B[i]

a. Rewrite the parallel loop in \(\text{SUM-ARRAYS}\) using nested parallelism (spawn and sync) in the manner of \(\text{MAT-VEC-MAIN-LOOP}\). Analyze the parallelism of your implementation.

Consider the following alternative implementation of the parallel loop, which contains a value \(grain\text-size\) to be specified:

C++
1
2
3
4
5
6
7
SUM-ARRAYS'(A, B, C)
    n = A.length
    grain-size = ?      // to be determined
    r = ceil(n / grain-size)
    for k = 0 to r - 1
        spawn ADD-SUBARRAY(A, B, C, k * grain-size + 1, min((k + 1) * grain-size, n))
    sync
C++
1
2
3
ADD-SUBARRAY(A, B, C, i, j)
    for k = i to j
        C[k] = A[k] + B[k]

b. Suppose that we set \(grain\text -size = 1\). What is the parallelism of this implementation?

c. Give a formula for the span of \(\text{SUM-ARRAYS}'\) in terms of \(n\) and \(grain\text-size\). Derive the best value for grain-size to maximize parallelism.

a. See the algorithm \(\text{SUM-ARRAYS}(A, B, C)\). The parallelism is \(O(n)\) since it's work is \(n\lg n\) and the span is \(\lg n\).

b. If grainsize is \(1\), this means that each call of \(\text{ADD-SUBARRAY}\) just sums a single pair of numbers. This means that since the for loop on line 4 will run \(n\) times, both the span and work will be \(O(n)\). So, the parallelism is just \(O(1)\).

C++
1
2
3
4
5
6
7
SUM-ARRAYS(A, B, C)
    n = floor(A.length / 2)
    if n == 0
        C[1] = A[1] + B[1]
    else
        spawn SUM-ARRAYS(A[1..n], B[1..n], C[1..n])
        SUM-ARRAYS(A[n + 1..A.length], B[n + 1..A..length], C[n + 1..A.length])

c. Let \(g\) be the grainsize. The runtime of the function that spawns all the other functions is \(\left\lceil \frac{n}{g} \right\rceil\). The runtime of any particular spawned task is \(g\). So, we want to minimize

\[\frac{n}{g} + g.\]

To do this we pull out our freshman calculus hat and take a derivative, we have

\[0 = 1 − \frac{n}{g^2}.\]

To solve this, we set \(g = \sqrt n\). This minimizes the quantity and makes the span \(O(n / g + g) = O(\sqrt n)\). Resulting in a parallelism of \(O(\sqrt n)\).