Implicit Parallelism: Trends in Microprocessor Architectures

Posted By on March 17, 2016

Download PDF
Introduction to Parallel Computing
Limitations of Memory System Performance
  • The tremendous gains in microprocessor clock speed since the 1980s have failed to provide similar gains in overall processing speed due to the slower pace of improvements in memory technology over the same time period.
  • Techniques to execute multiple instructions per clock cycle have been used to improve overall processing speed.
    • These techniques are known as implicit parallelism.

1: Pipelining and Superscalar Execution

  • Pipelining, or overlapping various stages in instruction execution (such as fetch, schedule, decode, execute, store, etc.) enables faster execution.
    • A pipeline is executed in stages, where each stage has a task (such as executing one type of instruction)
    • The speed of a pipeline is limited by the slowest atomic task.
    • Branches (if-then-else instructions) must be handled using speculative execution. The deeper the pipeline, the greater the penalty for branch misprediction, since the pipeline must be flushed after exeucting the wrong branch.
    • Multiple pipelines can further improve the instruction execution rate.
    • Super-pipelined processor: A processor that use multiple pipelines.
    • Superscalar execution: The ability of a processor to execute multiple instructions per clock cycle.
  • True data dependency: The results of an instruction may be required for subsequent instructions.
    • Data dependencies must be resolved before simultaneous instructions can be executed.
  • Resource dependency: Two instructions must compete for a single processor resource
    • Example: two floating-point operations competing for the use of the FPU.
  • Branch dependency: The inability of branch instructions to be executed a priority.
    • Branch dependencies are handled by speculative execution and rolling back in case of errors.
  • Out-of-order execution (also Dynamic instruction issue): The ability to execute instructions in a different order than they are issued in a program, while still preserving semantic equivalence with the old order of instruction execution, with the intent of eliminating dependencies.
  • Vertical waste: no instructions are issued to the execution units during a cycle.
  • Horizonal waste: instructions are issued to some but not all of the execution units during a cycle.

2: Very Long Instruction Word Processors

  • Very Long Instruction Word processors uses the compiler instead of hardware to resolve dependencies.
Introduction to Parallel Computing
Limitations of Memory System Performance

Download PDF

Posted by Akash Kurup

Founder and C.E.O, World4Engineers Educationist and Entrepreneur by passion. Orator and blogger by hobby