Limitations of Memory System Performance

Posted By on March 17, 2016

Download PDF
Implicit Parallelism: Trends in Microprocessor Architectures
Dichotomy of Parallel Computing Platforms
  • Latency: The time it takes for a block of data to be retrieved from memory after a request has been made.
  • Bandwidth: The rate at which data can be transferred from memory to the processor.

1: Improving Effective Memory Latency Using Caches

  • Cache: a low-latency, high-bandwidth storage that is a smaller, faster intermediary between the processor and DRAM.
    • Memory stored in the cache will henchforth be accessed in the cache.
    • Hit ratio: The fraction of data references satisfied by the cache.
    • Memory bound: The limit of effective computation rate due to memory.
    • Temporal Locality of Reference: Repeated reference to a data item in a small time window.

2: Impact of Memory Bandwidth

  • Memory bandwidth is determined by the bandwidth of the memory bus as well as the bandwidth of the memory units.
  • One common technique to increase memory bandwidth is to increase the size of the memory blocks.
    • Cache line: A memory block made up of n words.
  • Wide buses can be used for multi-word memory blocks. These buses are typically more expensive to construct. Single words of a memory block can also be transferred through a narrower bus.
  • Spatial locality: consecutive data words in memory are used by successive instructions.
  • Tiling: breaking the iteration space into blocks and computing the result one block at a time.
  • Important concepts to improve the memory bound on performance:
    • Exploit temporal and spatial locality.
    • The ratio of number of ops vs. number of memory accesses gives a good indicator of tolerance to memory bandwidth.
    • Memory layouts and code organization affects spatial and temporal locality.

3: Alternative Approaches for Hiding Memory Latency

  • Prefetching: Anticipating memory accesses in advance and issuing requests for data in memory before computation.
  • Multithreading: Making single memory accesses to a larger chunk of memory in separate threads of execution.

4: Tradeoffs of Multithreading and Prefetching

  • In the example, multithreading is not latency bound but bandwidth bound.
Implicit Parallelism: Trends in Microprocessor Architectures
Dichotomy of Parallel Computing Platforms

Download PDF

Posted by Akash Kurup

Founder and C.E.O, World4Engineers Educationist and Entrepreneur by passion. Orator and blogger by hobby