Shared Memory Systems

Examples

OpenMP:

CUDA:

Foster’s Methodology

Decomposition: partition data or tasks.

Task granularity: Impact on communication/thread formation

Communication: local (parallel) or global (sequential)

Rules of thumb:

Agglomeration: Combine groups of tasks for sending/receiving

Mapping: Assigning of tasks to execution units

Parallel Programming models

  1. Task pool
  2. Parbegin-parend
  3. SIMD/SPMD
  4. Master-Worker
  5. Client-Server/MPMD
  6. Task pool
  7. Producer Consumer
  8. Pipelining

Metrics

perf list:

perf stat

Algorithm description

must include:

  1. Data distribution
  2. Parallel programming model
  3. Key constructs (MPI, CUDA, OpenMP)
  4. Metrics (utilization of processes, resources, idle time, cache)
  5. Interconnection (if distributed)
  6. Sources of inefficiency
    1. Waiting/idle time
    2. Overheads
    3. Cache misses/thrashing/memory contention
  7. Prevention of deadlock/data race
    1. Odd even often works

CUDA Programming: Memory Management