Physical Organization of Parallel Programs

Posted By on March 17, 2016

Download PDF
Dichotomy of Parallel Computing Platforms
Communication Costs in Parallel Machines

1 Architecture of an Ideal Parallel Computer

  • Parallel Random Access Machine (PRAM) – The parallel computer consists of p processors and a global memory of unbounded size that is uniformily accessible to all processors. All processors share the same address space and clock.
    • Exclusive-read, exclusive write (EREW) PRAM – Weakest model. Memory location is exclusive, and no concurrent read or write operations are allowed.
    • Concurrent-read, exclusive-write (CREW) PRAM – Multiple reads from memory are parallelized, but multiple writes are sequential.
    • Exclusive-read, concurrent-write (ERCW) PRAM – Sequential reads from memory, concurrent writes to memory.
    • Concurrent-read, concurrent-write (CRCW) PRAM – Parallelized multiple reads and writes. This is the strongest PRAM model.
  • Concurrent write access to memory requires arbitration. The types of arbitration are:
    • Common – concurrent write is allowed if all the values that the processos are attempting to write are identical.
    • Arbitrary – an arbitrary processor is chosen to write to memory and all other processors wishing to write to the same memory location fail.
    • Priority – processors are organized by a priority list, and the processor with the highest priority can write to memory while the others fail.
    • Sum – The sum of all the quantities are written to memory. Other associative operators can be used instead of sum also.
  • Architectual compexity of the Ideal Model – PRAM models are impossible to realize in practice. Example: for an EREW model, switches are needed in order to insure that no two processors simultaneously access the same word. The complexity of the switches is mp, where m is the number of words in memory and p is the number of processors.

2 Interconnection Networks for Parallel Computers

  • Networks are connected by links and switches. Links are physical media capable of transferring data.
  • Static networks – consist of point-to-point communication links among processing nodes. These are known as direct networks.
  • Dynamic networks – consist of switches and network links. These are known as indirect networks.
  • Switch – consists of a set of input ports and output ports in a network. They map inputs to outputs, etc.
    • Degree – the number of ports on a swtich.
    • Other roles of switches: internal buffering, routing, and multicast.
  • Connectivity between nodes is provided by a network interface.

3 Network Topologies

  • Bus-based networks – Consists of a shared medium that is common to all nodes. Simplest network. Scalable for cost but not for performance.
  • Crossbar networks – Connect processors to memory using a grid of switches or switching nodes. Scalable for performance but not for cost.
  • Multistage networks – A compromise between bus-based and crossbar networks. Connects inputs to outputs through a series of intermediary stages.
  • Completely-Connected Network – Each node has a direct communication link to every other node in the network.
  • Star-connected network – One processor acts as the central processor, and every other processor has a link to the central processor. Processing nodes communicate via the central processor.
  • Linear array – each processor, except the ones at the edge of the network, has a neighbor to the left and a neighbor to the right.
    • 1-D Torus – the processors at the edges also have left and right neighbors.
    • 2-D mesh – made up of a collection of linear arrays or 1-D torii. Each node now has a left, right, up, and down neighbor (sans edges in the 2-D linear array). Each dimension has sqrt(p) nodes.
    • k-d mesh – multi-dimensional extension of the mesh. Has d dimensions with k nodes along each dimension.
      • Hypercube – two nodes along each dimension and log p dimensions. Numbering scheme is binary. Distance between nodes is determined by checking the difference in bits between any two node labels.
  • Tree-based network – Only one path between any two nodes. Static trees have a processing element at each level. Dynamic trees have switching nodes at intermediate levels and processing nodes at the leaves.
    • To route a message in a tree, the source node sends a message up to the highest subtree whose root contains both the source and destination nodes.
    • Fat tree – a tree with more communication links and switching nodes closer to the root.

4 Evaluating Static Interconnection Networks

  • Diameter – maximum distance between any two processing nodes in the network (i.e. the longest shortest path between two paths in the network).
    • Various Diameters:
      • Strongly-connected network: 1
      • Star-connected network: 2
      • 1-D torus: p/2
      • 2-D mesh: 2(sqrt(p)-1)
      • Hypercube: log(p)
      • Binary-tree: 2log((p+1)/2)
  • Connectivity – measure of multiplicity of paths between any two nodes.
    • Arc Connectivity – minimum number of arcs that must be removed from the network to split it into two disconnected networks.
      • Arc connectivity of linear arrays, trees, and star networks: 1.
      • Arc connectivity of rings and 2-D meshes w/o wraparound: 2 (consider the corners of 2-D meshes)
      • Arc connectivity of d-dimensional hypercubes – d
  • Bisection Width and Bisection Bandwidth – minimum number of communication links that must be removed to partition the network into two equal halves.
    • Ring – 2
    • 2-D p-node without wraparound is sqrt(p), with wraparound 2sqrt(p).
    • Tree and star – 1
    • Completely connected network with p nodes is (p^2)/4.
    • Hypercube – p/2
  • Channel width – number of bits that can be communicated simultaneously over a link connecting two nodes. Equal to the number of physical wires in each communication link.
  • Channel rate – the peak rate at which a single physical wire can deliver bits.
  • Channel bandwidth – the peak rate at which data can be communicated between the ends of a communication link. Product of channel width and channel rate.
  • Bisection bandwidth – minimum volume of communication allowed between any two halves of the network. Product of bisection width and channel rate. Also calledcross-section bandwidth.

5 Evaluating Dynamic Interconnection Networks

  • Switches must be considered nodes in dynamic networks. Now removal of nodes (not just edges) becomes an important part in determining the characteristics of dynamic networks.
  • Node connectivity – the minimum number of nodes that must be removed from the network to split the network into two disconnected parts.
  • Bisection width – consider any possible partitioning, by removing arcs of the network, to form two halves with equal processing nodes. Then consider the partition that has the minimum number of arcs connecting the two halves.

6 Cache Coherence in Multiprocessor Systems

  • Maintaining caches in multiprocessor systems is much harder than uniprocessor systems. If processors share copies of data in memory, a coherence mechanism must insure program correctness.
    • Invalidate – other copies of data must be invalidated so only one processor operates on the correct data at any given time.
    • Update – all processors are updated with the correct data simultaneously.
    • False sharing – different processors update different parts of the same cache-line.
  • Snoopy caches – all processors snoop on the bus for transactions.
  • Directory-based systems – The global memory is augmented with a directory that maintains a bitmap representing cache-blocks and the processors at which they are cached.
Dichotomy of Parallel Computing Platforms
Communication Costs in Parallel Machines

Download PDF

Posted by Akash Kurup

Founder and C.E.O, World4Engineers Educationist and Entrepreneur by passion. Orator and blogger by hobby