Minimum Execution Time and Minimum Cost-Optimal Execution Time
We are often interested in knowing how fast a problem can be solved, or what the minimum possible execution time of a parallel algorithm is, provided that the number of processing elements is not a constraint. As we increase the number of processing elements for a given problem size, either the parallel runtime continues to decrease and asymptotically approaches a minimum value, or it starts rising after attaining a minimum value (Problem 5.12). We can determine the minimum parallel runtime for a given W by differentiating the expression for TP with respect to p and equating the derivative to zero (assuming that the function TP (W, p) is differentiable with respect to p). The number of processing elements for which TP is minimum is determined by the following equation:
Let p0 be the value of the number of processing elements that satisfies Equation 5.21. The value of can be determined by substituting p0 for p in the expression for TP . In the following example, we derive the expression for for the problem of adding n numbers.
Example 5.18 Minimum execution time for adding n numbers
Under the assumptions of Example 5.12, the parallel run time for the problem of adding n numbers on p processing elements can be approximated by
Equating the derivative with respect to p of the right-hand side of Equation 5.22 to zero we get the solutions for p as follows:
Substituting p = n/2 in Equation 5.22, we get
In Example 5.18, the processor-time product for p = p0 is Q(n log n), which is higher than the Q(n) serial complexity of the problem. Hence, the parallel system is not cost-optimal for the value of p that yields minimum parallel runtime. We now derive an important result that gives a lower bound on parallel runtime if the problem is solved cost-optimally.
Let be the minimum time in which a problem can be solved by a cost-optimal parallel system. From the discussion regarding the equivalence of cost-optimality and the isoefficiency function in Section 5.4.3, we conclude that if the isoefficiency function of a parallel system is Q(f(p)), then a problem of size W can be solved cost-optimally if and only if W = W(f(p)). In other words, given a problem of size W, a cost-optimal solution requires that p = O(f-1(W)). Since the parallel runtime is Q(W/p) for a cost-optimal parallel system (Equation 5.18), the lower bound on the parallel runtime for solving a problem of size W cost-optimally is
Example 5.19 Minimum cost-optimal execution time for adding n numbers
As derived in Example 5.14, the isoefficiency function f(p) of this parallel system is Q(p log p). If W = n = f(p) = p log p, then log n = log p + log log p. Ignoring the double logarithmic term, log n logp. If n = f (p) = p log p, then p = f-1(n) = n/log p n/log n. Hence, f-1(W) = Q(n/log n). As a consequence of the relation between cost-optimality and the isoefficiency function, the maximum number of processing elements that can be used to solve this problem cost-optimally is Q(n/log n). Using p = n/log n in Equation 5.2, we get
It is interesting to observe that both and for adding n numbers are Q(log n) (Equations 5.24 and 5.26). Thus, for this problem, a cost-optimal solution is also the asymptotically fastest solution. The parallel execution time cannot be reduced asymptotically by using a value of p greater than that suggested by the isoefficiency function for a given problem size (due to the equivalence between cost-optimality and the isoefficiency function). This is not true for parallel systems in general, however, and it is quite possible that . The following example illustrates such a parallel system.
Example 5.20 A parallel system with
Consider the hypothetical parallel system of Example 5.15, for which
From Equation 5.10, the parallel runtime for this system is
Using the methodology of Example 5.18,
From the preceding analysis, p0 = Q(W). Substituting p by the value of p0 in Equation 5.28, we get
According to Example 5.15, the overall isoefficiency function for this parallel system is Q(p3), which implies that the maximum number of processing elements that can be used cost-optimally is Q(W1/3). Substituting p = Q(W1/3) in Equation 5.28, we get
A comparison of Equations 5.29 and 5.30 shows that is asymptotically greater than .
In this section, we have seen examples of both types of parallel systems: those for which is asymptotically equal to , and those for which is asymptotically greater than . Most parallel systems presented in this book are of the first type. Parallel systems for which the runtime can be reduced by an order of magnitude by using an asymptotically higher number of processing elements than indicated by the isoefficiency function are rare.
While deriving the minimum execution time for any parallel system, it is important to be aware that the maximum number of processing elements that can be utilized is bounded by the degree of concurrency C(W) of the parallel algorithm. It is quite possible that p0 is greater than C(W) for a parallel system (Problems 5.13 and 5.14). In such cases, the value of p0 is meaningless, and is given by