Data distribution
Decomposition of array
Data distribution of 1D
- Blockwise ($P_j$ takes elements $\left[(j-1)B \dots JB - 1\right]$)
- Exploits locality
- If you know the size of your input, blockwise is best
- Cyclic ($P_J$ takes $\left[j, j + B, j + 2B \dots j + (c-2)p\right]$) where $c$
- Better load balancing
- Cache misses on almost every access, thrashing
Data distribution of 2D
- Blockwise column
- Blockwise row
- Cyclic column
- Cyclic row
- Block-cyclic with block size $b$ columns/rows
- Checkerboard
- Blockwise
- Cyclic
- Block-cyclic
Developing a solution in MPI:
- Data distribution
- Topology
- consider utilization: are all links utilized roughly equally?
- Task distribution (what each node does)
- Is task granularity appropriate?
- Communication