Intra Node Stencil Performance Evaluation Collection
| model type | Intel Skylake SP processor |
| model name | Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz |
| micro-architecture | |
| micro-architecture modeler | |
| cores per socket | 20 |
| cores per NUMA domain | 10 |
| cacheline size | 64 B |
| clock | 2.4 GHz |
| NUMA domains per socket | 2 |
This machine file was generated for kerncraft version 0.8.6.dev0.
| icc | -O3 -fno-alias -xCORE-AVX512 -qopenmp -qopt-zmm-usage=high -ffreestanding -nolib-inline |
| clang | -O3 -march=skylake-avx512 -D_POSIX_C_SOURCE=200809L -fopenmp -ffreestanding |
| gcc | -O3 -march=skylake-avx512 -D_POSIX_C_SOURCE=200809L -fopenmp -lm -ffreestanding |
| ADD | MUL | FMA | total | |
| Single Precission | 32 | 32 | 64 | 64 |
| Double Precission | 16 | 16 | 32 | 32 |
| groups | 20 |
| cores per group | 1 |
| threads per group | 2 |
| transfers overlap | false |
| sets | 64 |
| ways | 8 |
| cl_size | 64 |
| replacement_policy | LRU |
| write_allocate | true |
| write_back | true |
| load_from | L2 |
| store_to | L2 |
| accesses | MEM_INST_RETIRED_ALL_LOADS:PMC[0-3] + MEM_INST_RETIRED_ALL_STORES:PMC[0-3] |
| misses | L1D_REPLACEMENT:PMC[0-3] |
| evicts | L2_TRANS_L1D_WB:PMC[0-3] |
| groups | 20 |
| cores per group | 1 |
| threads per group | 2 |
| transfers overlap | false |
| sets | 1024 |
| ways | 16 |
| cl_size | 64 |
| replacement_policy | LRU |
| write_allocate | true |
| write_back | true |
| load_from | None |
| victims_to | L3 |
| store_to | L3 |
| accesses | L1D_REPLACEMENT:PMC[0-3] + L2_TRANS_L1D_WB:PMC[0-3] |
| misses | L2_LINES_IN_ALL:PMC[0-3] |
| evicts | L2_TRANS_L2_WB:PMC[0-3] |
| groups | 2 |
| cores per group | 10 |
| threads per group | 20 |
| transfers overlap | false |
| sets | 20480 |
| ways | 11 |
| cl_size | 64 |
| replacement_policy | LRU |
| write_allocate | false |
| write_back | true |
| accesses | L2_LINES_IN_ALL:PMC[0-3] + L2_TRANS_L2_WB:PMC[0-3] |
| misses | (CAS_COUNT_RD:MBOX0C[01] + CAS_COUNT_RD:MBOX1C[01] + CAS_COUNT_RD:MBOX2C[01] + CAS_COUNT_RD:MBOX3C[01] + CAS_COUNT_RD:MBOX4C[01] + CAS_COUNT_RD:MBOX5C[01]) |
| evicts | (CAS_COUNT_WR:MBOX0C[01] + CAS_COUNT_WR:MBOX1C[01] + CAS_COUNT_WR:MBOX2C[01] + CAS_COUNT_WR:MBOX3C[01] + CAS_COUNT_WR:MBOX4C[01] + CAS_COUNT_WR:MBOX5C[01]) |
| cores per group | 40 |
| threads per group | 80 |
| transfers overlap | false |
IACA00DV1234567, OSACA00DV1234567, LLVM-MCASKXDividerSKXFPDividerSKXPort0SKXPort1SKXPort2SKXPort3SKXPort4SKXPort5SKXPort6SKXPort7
Max(UOPS_DISPATCHED_PORT_PORT_0:PMC[0-3], UOPS_DISPATCHED_PORT_PORT_1:PMC[0-3], UOPS_DISPATCHED_PORT_PORT_4:PMC[0-3], UOPS_DISPATCHED_PORT_PORT_5:PMC[0-3], UOPS_DISPATCHED_PORT_PORT_6:PMC[0-3], UOPS_DISPATCHED_PORT_PORT_7:PMC[0-3])
IACA2D3D, OSACA2D3D, LLVM-MCASKXPort2SKXPort3
T_nOL + T_L2 + T_L3 + T_MEM
| FLOPs per iteration | 0 |
| read streams | 1 Streams with 8.00 B |
| write streams | 1 Streams with 8.00 B |
| read+write streams | 0 Streams with 0.00 B |
| FLOPs per iteration | 2 |
| read streams | 2 Streams with 16.00 B |
| write streams | 1 Streams with 8.00 B |
| read+write streams | 1 Streams with 8.00 B |
| FLOPs per iteration | 0 |
| read streams | 1 Streams with 8.00 B |
| write streams | 0 Streams with 0.00 B |
| read+write streams | 0 Streams with 0.00 B |
| FLOPs per iteration | 2 |
| read streams | 3 Streams with 24.00 B |
| write streams | 1 Streams with 8.00 B |
| read+write streams | 0 Streams with 0.00 B |
| FLOPs per iteration | 0 |
| read streams | 1 Streams with 8.00 B |
| write streams | 1 Streams with 8.00 B |
| read+write streams | 1 Streams with 8.00 B |