Intra Node Stencil Performance Evaluation Collection
dimension | 3D |
radius | 3 |
weighting | heterogeneous |
kind | star |
coefficients | constant |
datatype | double |
machine | IvyBridgeEP_E5-2660v2 |
Benchmark raw data shown on this page can be found in the according folder of the git repository.
If you have feedback, issues or found errors on this page, please submit an issue on the github page.
double a[M][N][P];
double b[M][N][P];
double c0;
double c1;
double c2;
double c3;
double c4;
double c5;
double c6;
double c7;
double c8;
double c9;
double c10;
double c11;
double c12;
double c13;
double c14;
double c15;
double c16;
double c17;
double c18;
for( int k = 3; k < M-3; k++ ) {
for( int j = 3; j < N-3; j++ ) {
for( int i = 3; i < P-3; i++ ) {
b[k][j][i] = c0 * a[k][j][i]
+ c1 * a[k][j][i-1] + c2 * a[k][j][i+1]
+ c3 * a[k-1][j][i] + c4 * a[k+1][j][i]
+ c5 * a[k][j-1][i] + c6 * a[k][j+1][i]
+ c7 * a[k][j][i-2] + c8 * a[k][j][i+2]
+ c9 * a[k-2][j][i] + c10 * a[k+2][j][i]
+ c11 * a[k][j-2][i] + c12 * a[k][j+2][i]
+ c13 * a[k][j][i-3] + c14 * a[k][j][i+3]
+ c15 * a[k-3][j][i] + c16 * a[k+3][j][i]
+ c17 * a[k][j-3][i] + c18 * a[k][j+3][i];
}
}
}
P <= 2048/7
, that is 292
P <= 16384/7
, that is 2340
P <= 1638400/7
, that is 234057
48*N*P + 16*P*(N - 3) + 48*P <= 32768
, that is 32²
48*N*P + 16*P*(N - 3) + 48*P <= 262144
, that is 91²
48*N*P + 16*P*(N - 3) + 48*P <= 26214400
, that is 452²
Have a look at the kernel source code for dimension naming.
Generate this stencil with:
stempel gen -D 3 -r 3 -t "double" -C constant -k star -e --store stencil.c
and generate the compilable benchmark code with:
stempel bench stencil.c -m IvyBridgeEP_E5-2660v2.yml --store
icc -O3 -xCORE-AVX2 -fno-alias -qopenmp -DLIKWID_PERFMON -I/mnt/opt/likwid-4.3.2/include -L/mnt/opt/likwid-4.3.2/lib -I./stempel/stempel/headers/ ./stempel/headers/timing.c ./stempel/headers/dummy.c solar_compilable.c -o stencil -llikwid
Comparison of the measured stencil performance (in cycles per cache line), roofline prediction and the (stacked) contributions of the ECM Performance Model predicted by kerncraft using Layer Conditions to model the cache behavior. The calculated layer conditions shown above correspond to the jumps in the ECM prediction in this plot.
Data transfers between the different cache levels and main memory. The shown data for each level contains evicted and loaded data. The measured data is represented by points and the predicted transfer rates by kerncraft by lines.
Review status:
Using the generated stencil and kerncraft, all single core performance data shown on this page can be reproduced by:
kerncraft -p ECM -p RooflineIACA -p Benchmark -p LC -P LC -m IvyBridgeEP_E5-2660v2.yml stencil.c -D N $GRID_SIZE -D M $GRID_SIZE -D P $GRID_SIZE -vvv --cores 1 --compiler icc
kerncraft -p ECM -p RooflineIACA -p Benchmark -p LC -P CS -m IvyBridgeEP_E5-2660v2.yml stencil.c -D N $GRID_SIZE -D M $GRID_SIZE -D P $GRID_SIZE -vvv --cores 1 --compiler icc
The generated benchmark code can be used to reproduce the thread scaling data shown on this page by:
kerncraft -p ECM -p RooflineIACA -p Benchmark -P LC -m IvyBridgeEP_E5-2660v2.yml stencil.c -D N $GRID_SIZE -D M $GRID_SIZE -D P $GRID_SIZE -vvv --cores $CORES --compiler icc
The roofline prediction can be obtained with kerncraft and the generated stencil:
kerncraft -p RooflineIACA -P LC -m IvyBridgeEP_E5-2660v2.yml stencil.c -D N $GRID_SIZE -D M $GRID_SIZE -D P $GRID_SIZE -vvv --cores ${threads} --compiler icc
Generate benchmark code from the stencil with blocking and compile it as shown before:
stempel bench stencil.c -m IvyBridgeEP_E5-2660v2.yml -b 2 --store
OMP_NUM_THREADS=1 likwid-pin -C S0:0 ./stencil $GRID_SIZE $GRID_SIZE $GRID_SIZE $BLOCKING_M $BLOCKING_N $BLOCKING_P