Intra Node Stencil Performance Evaluation Collection
model type | Intel Xeon Broadwell EN/EP/EX processor |
model name | Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz |
micro-architecture | BDW |
micro-architecture modeler | |
cores per socket | 10 |
cores per NUMA domain | 10 |
cacheline size | 64 B |
clock | 2.2 GHz |
NUMA domains per socket | 1 |
This machine file was generated for kerncraft version 0.8.0.
icc | -O3 -xCORE-AVX2 -fno-alias -qopenmp |
gcc | -Ofast -march=core-avx2 -fargument-noalias -ffast-math -D_POSIX_C_SOURCE=200112L -fopenmp |
clang | -03 -mavx2 -D_POSIX_C_SOURCE=200112L -fopenmp |
ADD | MUL | FMA | total | |
Single Precission | 8 | 8 | 16 | 32 |
Double Precission | 4 | 4 | 8 | 16 |
groups | 20 |
cores per group | 1 |
threads per group | 1 |
transfers overlap |
cl_size | 64 |
load_from | L2 |
replacement_policy | LRU |
sets | 64 |
store_to | L2 |
ways | 8 |
write_allocate | true |
write_back | true |
accesses | MEM_UOPS_RETIRED_LOADS_ALL:PMC[0-3] |
misses | L1D_REPLACEMENT:PMC[0-3] |
evicts | L2_TRANS_L1D_WB:PMC[0-3] |
groups | 20 |
cores per group | 1 |
threads per group | 1 |
transfers overlap | |
non-overlap upstream throughput | 64 B/cy, half-duplex |
cl_size | 64 |
load_from | L3 |
replacement_policy | LRU |
sets | 512 |
store_to | L3 |
ways | 8 |
write_allocate | true |
write_back | true |
accesses | L1D_REPLACEMENT:PMC[0-3] |
misses | L2_LINES_IN_ALL:PMC[0-3] |
evicts | L2_TRANS_L2_WB:PMC[0-3] |
groups | 2 |
cores per group | 10 |
threads per group | 10 |
transfers overlap | |
non-overlap upstream throughput | 32 B/cy, half-duplex |
cl_size | 64 |
replacement_policy | LRU |
sets | 6400 |
ways | 64 |
write_allocate | true |
write_back | true |
accesses | L2_LINES_IN_ALL:PMC[0-3] |
misses | (LLC_LOOKUP_DATA_READ:CBOX0C[01] + LLC_LOOKUP_DATA_READ:CBOX1C[01] + LLC_LOOKUP_DATA_READ:CBOX2C[01] + LLC_LOOKUP_DATA_READ:CBOX3C[01] + LLC_LOOKUP_DATA_READ:CBOX4C[01] + LLC_LOOKUP_DATA_READ:CBOX5C[01] + LLC_LOOKUP_DATA_READ:CBOX6C[01] + LLC_LOOKUP_DATA_READ:CBOX7C[01] + LLC_LOOKUP_DATA_READ:CBOX8C[01] + LLC_LOOKUP_DATA_READ:CBOX9C[01] + LLC_LOOKUP_DATA_READ:CBOX10C[01] + LLC_LOOKUP_DATA_READ:CBOX11C[01] + LLC_LOOKUP_DATA_READ:CBOX12C[01] + LLC_LOOKUP_DATA_READ:CBOX13C[01] + LLC_LOOKUP_DATA_READ:CBOX14C[01] + LLC_LOOKUP_DATA_READ:CBOX15C[01] + LLC_LOOKUP_DATA_READ:CBOX16C[01] + LLC_LOOKUP_DATA_READ:CBOX17C[01] + LLC_LOOKUP_DATA_READ:CBOX18C[01] + LLC_LOOKUP_DATA_READ:CBOX19C[01] + LLC_LOOKUP_DATA_READ:CBOX20C[01] + LLC_LOOKUP_DATA_READ:CBOX21C[01]) |
evicts | (LLC_VICTIMS_M:CBOX0C[01] + LLC_VICTIMS_M:CBOX1C[01] + LLC_VICTIMS_M:CBOX2C[01] + LLC_VICTIMS_M:CBOX3C[01] + LLC_VICTIMS_M:CBOX4C[01] + LLC_VICTIMS_M:CBOX5C[01] + LLC_VICTIMS_M:CBOX6C[01] + LLC_VICTIMS_M:CBOX7C[01] + LLC_VICTIMS_M:CBOX8C[01] + LLC_VICTIMS_M:CBOX9C[01] + LLC_VICTIMS_M:CBOX10C[01] + LLC_VICTIMS_M:CBOX11C[01] + LLC_VICTIMS_M:CBOX12C[01] + LLC_VICTIMS_M:CBOX13C[01] + LLC_VICTIMS_M:CBOX14C[01] + LLC_VICTIMS_M:CBOX15C[01] + LLC_VICTIMS_M:CBOX16C[01] + LLC_VICTIMS_M:CBOX17C[01] + LLC_VICTIMS_M:CBOX18C[01] + LLC_VICTIMS_M:CBOX19C[01] + LLC_VICTIMS_M:CBOX20C[01] + LLC_VICTIMS_M:CBOX21C[01]) |
cores per group | 10 |
threads per group | 10 |
transfers overlap | |
non-overlap upstream throughput | full socket memory bandwidth, half-duplex |
0, 0DV, 1, 2, 3, 4, 5, 6, 7
Max(UOPS_EXECUTED_PORT_PORT_0:PMC[0-3], UOPS_EXECUTED_PORT_PORT_1:PMC[0-3], UOPS_EXECUTED_PORT_PORT_4:PMC[0-3], UOPS_EXECUTED_PORT_PORT_5:PMC[0-3], UOPS_EXECUTED_PORT_PORT_6:PMC[0-3], UOPS_EXECUTED_PORT_PORT_7:PMC[0-3])
2D, 3D
T_OL + T_L1L2 + T_L2L3 + T_L3MEM
FLOPs per iteration | 0 |
read streams | 1 Streams with 8.00 B |
write streams | 1 Streams with 8.00 B |
read+write streams | 0 Streams with 0.00 B |
FLOPs per iteration | 2 |
read streams | 2 Streams with 16.00 B |
write streams | 1 Streams with 8.00 B |
read+write streams | 1 Streams with 8.00 B |
FLOPs per iteration | 0 |
read streams | 1 Streams with 8.00 B |
write streams | 0 Streams with 0.00 B |
read+write streams | 0 Streams with 0.00 B |
FLOPs per iteration | 2 |
read streams | 3 Streams with 24.00 B |
write streams | 1 Streams with 8.00 B |
read+write streams | 0 Streams with 0.00 B |
FLOPs per iteration | 0 |
read streams | 1 Streams with 8.00 B |
write streams | 1 Streams with 8.00 B |
read+write streams | 1 Streams with 8.00 B |