Intra Node Stencil Performance Evaluation Collection
model type | Intel Xeon Haswell EN/EP/EX processor |
model name | Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz |
micro-architecture | |
micro-architecture modeler | |
cores per socket | 14 |
cores per NUMA domain | 7 |
cacheline size | 64 B |
clock | 2.3 GHz |
NUMA domains per socket | 2 |
This machine file was generated for kerncraft version 0.8.6.dev0.
icc | -O3 -xCORE-AVX2 -fno-alias -qopenmp -ffreestanding -nolib-inline |
clang | -O3 -mavx2 -D_POSIX_C_SOURCE=200809L -fopenmp -ffreestanding |
gcc | -O3 -march=core-avx2 -D_POSIX_C_SOURCE=200809L -fopenmp -lm -ffreestanding |
ADD | MUL | FMA | total | |
Single Precission | 8 | 8 | 16 | 32 |
Double Precission | 4 | 4 | 8 | 16 |
groups | 28 |
cores per group | 1 |
threads per group | 2 |
transfers overlap | false |
sets | 64 |
ways | 8 |
cl_size | 64 |
replacement_policy | LRU |
write_allocate | true |
write_back | true |
load_from | L2 |
store_to | L2 |
accesses | MEM_UOPS_RETIRED_LOADS:PMC[0-3] + MEM_UOPS_RETIRED_STORES:PMC[0-3] |
misses | L1D_REPLACEMENT:PMC[0-3] |
evicts | L1D_M_EVICT:PMC[0-3] |
groups | 28 |
cores per group | 1 |
threads per group | 2 |
transfers overlap | false |
sets | 512 |
ways | 8 |
cl_size | 64 |
replacement_policy | LRU |
write_allocate | true |
write_back | true |
load_from | L3 |
store_to | L3 |
accesses | L1D_REPLACEMENT:PMC[0-3] + L1D_M_EVICT:PMC[0-3] |
misses | L2_LINES_IN_ALL:PMC[0-3] |
evicts | L2_TRANS_L2_WB:PMC[0-3] |
groups | 4 |
cores per group | 7 |
threads per group | 14 |
transfers overlap | false |
sets | 9216 |
ways | 16 |
cl_size | 64 |
replacement_policy | LRU |
write_allocate | true |
write_back | true |
accesses | L2_LINES_IN_ALL:PMC[0-3] + L2_TRANS_L2_WB:PMC[0-3] |
misses | (CAS_COUNT_RD:MBOX0C[01] + CAS_COUNT_RD:MBOX1C[01] + CAS_COUNT_RD:MBOX2C[01] + CAS_COUNT_RD:MBOX3C[01] + CAS_COUNT_RD:MBOX4C[01] + CAS_COUNT_RD:MBOX5C[01] + CAS_COUNT_RD:MBOX6C[01] + CAS_COUNT_RD:MBOX7C[01]) |
evicts | (CAS_COUNT_WR:MBOX0C[01] + CAS_COUNT_WR:MBOX1C[01] + CAS_COUNT_WR:MBOX2C[01] + CAS_COUNT_WR:MBOX3C[01] + CAS_COUNT_WR:MBOX4C[01] + CAS_COUNT_WR:MBOX5C[01] + CAS_COUNT_WR:MBOX6C[01] + CAS_COUNT_WR:MBOX7C[01]) |
cores per group | 14 |
threads per group | 28 |
transfers overlap | false |
IACA00DV1234567, OSACA00DV1234567, LLVM-MCAHWDividerHWFPDividerHWPort0HWPort1HWPort2HWPort3HWPort4HWPort5HWPort6HWPort7
Max(UOPS_EXECUTED_PORT_PORT_0:PMC[0-3], UOPS_EXECUTED_PORT_PORT_1:PMC[0-3], UOPS_EXECUTED_PORT_PORT_4:PMC[0-3], UOPS_EXECUTED_PORT_PORT_5:PMC[0-3], UOPS_EXECUTED_PORT_PORT_6:PMC[0-3], UOPS_EXECUTED_PORT_PORT_7:PMC[0-3])
IACA2D3D, OSACA2D3D, LLVM-MCAHWPort2HWPort3
T_nOL + T_L1L2 + T_L2L3 + T_L3MEM
FLOPs per iteration | 0 |
read streams | 1 Streams with 8.00 B |
write streams | 1 Streams with 8.00 B |
read+write streams | 0 Streams with 0.00 B |
FLOPs per iteration | 2 |
read streams | 2 Streams with 16.00 B |
write streams | 1 Streams with 8.00 B |
read+write streams | 1 Streams with 8.00 B |
FLOPs per iteration | 0 |
read streams | 1 Streams with 8.00 B |
write streams | 0 Streams with 0.00 B |
read+write streams | 0 Streams with 0.00 B |
FLOPs per iteration | 2 |
read streams | 3 Streams with 24.00 B |
write streams | 1 Streams with 8.00 B |
read+write streams | 0 Streams with 0.00 B |
FLOPs per iteration | 0 |
read streams | 1 Streams with 8.00 B |
write streams | 1 Streams with 8.00 B |
read+write streams | 1 Streams with 8.00 B |