27-2 Saving temporary space in matrix multiplication

The \(\text{P-MATRIX-MULTIPLY-RECURSIVE}\) procedure has the disadvantage that it must allocate a temporary matrix \(T\) of size \(n \times n\), which can adversely affect the constants hidden by the \(\Theta\)-notation. The \(\text{P-MATRIX-MULTIPLY-RECURSIVE}\) procedure does have high parallelism, however. For example, ignoring the constants in the \(\Theta\)-notation, the parallelism for multiplying \(1000 \times 1000\) matrices comes to approximately \(1000^3 / 10^2 = 10^7\), since \(\lg 1000 \approx 10\). Most parallel computers have far fewer than 10 million processors.

a. Describe a recursive multithreaded algorithm that eliminates the need for the temporary matrix \(T\) at the cost of increasing the span to \(\Theta(n)\). (\(\textit{Hint:}\) Compute \(C = C + AB\) following the general strategy of \(\text{P-MATRIX-MULTIPLY-RECURSIVE}\), but initialize \(C\) in parallel and insert a sync in a judiciously chosen location.)

b. Give and solve recurrences for the work and span of your implementation.

c. Analyze the parallelism of your implementation. Ignoring the constants in the \(\Theta\)-notation, estimate the parallelism on \(1000 \times 1000\) matrices. Compare with the parallelism of \(\text{P-MATRIX-MULTIPLY-RECURSIVE}\).

(Removed)