Performance

Perspectives

  1. User: Response Time (shorter program execution time)
  2. Computer manager: Throughput (work/time)

Performance Factors

In decreasing level of abstraction

  1. Programming model
    1. Algorithm
  2. Computational model
    1. Analysis
  3. Architectural model
    1. Lowest level

Memory Access Workflow

Read access (load)

Write access (store)

Let $N$ = instruction count

\[T_{no_cache} = ( CPI(A) \cdot N + CPI_{rw} \cdot N_{rw} ) \cdot T_{cycle} = ( 2N + 0.33 \times N_read \times 100 ) \cdot T_{cycle} = (2 + 33)N \times \frac{1}{f} = 35 N/f\] \[T_{double} = ( CPI \cdot N + 0.33 N \times R_{miss} \times CPI_{rw} ) \cdot T_{cycle} = (2N + 0.33 \times 0.02 \times N \times 200) \cdot T_{cycle} = (2 + 0.66)N \times \frac{1}{f} = 2.66 N/f\]

Small problem size:

Large problem size:

Scaling constraints:

Properties

Laws

Amdahl’s Law

Speedup of parallel execution is limited by the fraction of the algorithm that cannot be parallelized (f).

$f$ is the bottleneck.

insert slide

Gustafson’s Law

$S_p(n) = \frac{T_\star(n)}$