# Preliminaries and Decomposition

Contents

- 1 Preliminaries
- 1.1 1 Decomposition, Tasks, and Dependency Graphs
- 1.1.1 Figure 3.1. Decomposition of dense matrix-vector multiplication into n tasks, where n is the number of rows in the matrix. The portions of the matrix and the input and output vectors accessed by Task 1 are highlighted.
- 1.1.2 Table 3.1. A database storing information about used vehicles.
- 1.1.3 Figure 3.2. The different tables and their dependencies in a query processing operation.
- 1.1.4 Figure 3.3. An alternate data-dependency graph for the query processing operation.

- 1.1 1 Decomposition, Tasks, and Dependency Graphs

### Preliminaries

Dividing a computation into smaller computations and assigning them to different processors for parallel execution are the two key steps in the design of parallel algorithms. In this section, we present some basic terminology and introduce these two key steps in parallel algorithm design using matrix-vector multiplication and database query processing as examples.

#### 1 Decomposition, Tasks, and Dependency Graphs

The process of dividing a computation into smaller parts, some or all of which may potentially be executed in parallel, is called decomposition. Tasks are programmer-defined units of computation into which the main computation is subdivided by means of decomposition. Simultaneous execution of multiple tasks is the key to reducing the time required to solve the entire problem. Tasks can be of arbitrary size, but once defined, they are regarded as indivisible units of computation. The tasks into which a problem is decomposed may not all be of the same size.

Example 3.1 Dense matrix-vector multiplication

Consider the multiplication of a dense n x n matrix A with a vector b to yield another vector y. The ith element y[i] of the product vector is the dot-product of the ith row of A with the input vector b; i.e., . As shown later in Figure 3.1, the computation of each y[i] can be regarded as a task. Alternatively, as shown later in Figure 3.4, the computation could be decomposed into fewer, say four, tasks where each task computes roughly n/4 of the entries of the vector y.

##### Figure 3.1. Decomposition of dense matrix-vector multiplication into n tasks, where n is the number of rows in the matrix. The portions of the matrix and the input and output vectors accessed by Task 1 are highlighted.

Note that all tasks in Figure 3.1 are independent and can be performed all together or in any sequence. However, in general, some tasks may use data produced by other tasks and thus may need to wait for these tasks to finish execution. An abstraction used to express such dependencies among tasks and their relative order of execution is known as a task-dependency graph. A task-dependency graph is a directed acyclic graph in which the nodes represent tasks and the directed edges indicate the dependencies amongst them. The task corresponding to a node can be executed when all tasks connected to this node by incoming edges have completed. Note that task-dependency graphs can be disconnected and the edge-set of a task-dependency graph can be empty. This is the case for matrix-vector multiplication, where each task computes a subset of the entries of the product vector. To see a more interesting task-dependency graph, consider the following database query processing example.

Example 3.2 Database query processing

Table 3.1 shows a relational database of vehicles. Each row of the table is a record that contains data corresponding to a particular vehicle, such as its ID, model, year, color, etc. in various fields. Consider the computations performed in processing the following query:

MODEL=”Civic” AND YEAR=”2001″ AND (COLOR=”Green” OR COLOR=”White”)

This query looks for all 2001 Civics whose color is either Green or White. On a relational database, this query is processed by creating a number of intermediate tables. One possible way is to first create the following four tables: a table containing all Civics, a table containing all 2001-model cars, a table containing all green-colored cars, and a table containing all white-colored cars. Next, the computation proceeds by combining these tables by computing their pairwise intersections or unions. In particular, it computes the intersection of the Civic-table with the 2001-model year table, to construct a table of all 2001-model Civics. Similarly, it computes the union of the green- and white-colored tables to compute a table storing all cars whose color is either green or white. Finally, it computes the intersection of the table containing all the 2001 Civics with the table containing all the green or white vehicles, and returns the desired list.

ID# |
Model |
Year |
Color |
Dealer |
Price |
---|---|---|---|---|---|

4523 |
Civic |
2002 |
Blue |
MN |
$18,000 |

3476 |
Corolla |
1999 |
White |
IL |
$15,000 |

7623 |
Camry |
2001 |
Green |
NY |
$21,000 |

9834 |
Prius |
2001 |
Green |
CA |
$18,000 |

6734 |
Civic |
2001 |
White |
OR |
$17,000 |

5342 |
Altima |
2001 |
Green |
FL |
$19,000 |

3845 |
Maxima |
2001 |
Blue |
NY |
$22,000 |

8354 |
Accord |
2000 |
Green |
VT |
$18,000 |

4395 |
Civic |
2001 |
Red |
CA |
$17,000 |

7352 |
Civic |
2002 |
Red |
WA |
$18,000 |

The various computations involved in processing the query in Example 3.2 can be visualized by the task-dependency graph shown in Figure 3.2. Each node in this figure is a task that corresponds to an intermediate table that needs to be computed and the arrows between nodes indicate dependencies between the tasks. For example, before we can compute the table that corresponds to the 2001 Civics, we must first compute the table of all the Civics and a table of all the 2001-model cars.

##### Figure 3.2. The different tables and their dependencies in a query processing operation.

Note that often there are multiple ways of expressing certain computations, especially those involving associative operators such as addition, multiplication, and logical AND or OR. Different ways of arranging computations can lead to different task-dependency graphs with different characteristics. For instance, the database query in Example 3.2 can be solved by first computing a table of all green or white cars, then performing an intersection with a table of all 2001 model cars, and finally combining the results with the table of all Civics. This sequence of computation results in the task-dependency graph shown in Figure 3.3.