Intra Node Stencil Performance Evaluation Collection
model type | AMD K17 (Zen) architecture |
model name | AMD EPYC 7451 24-Core Processor |
micro-architecture | |
micro-architecture modeler | |
cores per socket | 24 |
cores per NUMA domain | 6 |
cacheline size | 64 B |
clock | 2.3 GHz |
NUMA domains per socket | 4 |
This machine file was generated for kerncraft version 0.8.6.dev0.
clang | -O3 -march=znver1 -D_POSIX_C_SOURCE=200112L -fopenmp -ffreestanding |
gcc | -O3 -march=znver1 -fopenmp -ffreestanding |
icc | -O3 -xHost -fno-alias -qopenmp -ffreestanding -nolib-inline |
ADD | MUL | FMA | total | |
Single Precission | 8 | 8 | 8 | 16 |
Double Precission | 4 | 4 | 4 | 8 |
groups | 48 |
cores per group | 1 |
threads per group | 2 |
transfers overlap | true |
sets | 128 |
ways | 4 |
cl_size | 64 |
replacement_policy | LRU |
write_allocate | true |
write_back | true |
load_from | L2 |
store_to | L2 |
accesses | DATA_CACHE_ACCESSES__PMC[0-3] |
misses | DATA_CACHE_MISSES__PMC[0-3] |
evicts | DATA_CACHE_WRITEBACKS__PMC[0-3] |
groups | 48 |
cores per group | 1 |
threads per group | 2 |
transfers overlap | true |
sets | 1024 |
ways | 8 |
cl_size | 64 |
replacement_policy | LRU |
write_allocate | true |
write_back | true |
load_from | None |
victims_to | L3 |
store_to | L3 |
accesses | INFORMATION_REQUIRED (e.g., L1D_REPLACEMENT__PMC0) |
misses | INFORMATION_REQUIRED (e.g., L2_LINES_IN_ALL__PMC1) |
evicts | INFORMATION_REQUIRED (e.g., L2_LINES_OUT_DIRTY_ALL__PMC2) |
groups | 16 |
cores per group | 3 |
threads per group | 6 |
transfers overlap | false |
sets | 8192 |
ways | 16 |
cl_size | 64 |
replacement_policy | LRU |
write_allocate | false |
write_back | true |
accesses | EVENT_L3_ACCESS__CMPC[0-5] |
misses | EVENT_L3_MISS__CMPC[0-5] |
evicts | INFORMATION_REQUIRED (e.g., L2_LINES_OUT_DIRTY_ALL__PMC2) |
cores per group | 24 |
threads per group | 48 |
transfers overlap | false |
OSACA01233DV4567, LLVM-MCAZnAGU0ZnAGU1ZnALU0ZnALU1ZnALU2ZnALU3ZnDividerZnFPU0ZnFPU1ZnFPU2ZnFPU3ZnMultiplier
INFORAMTION_REQUIRED Example:max(UOPS_DISPATCHED_PORT_PORT_0__PMC2, UOPS_DISPATCHED_PORT_PORT_1__PMC3, UOPS_DISPATCHED_PORT_PORT_4__PMC0, UOPS_DISPATCHED_PORT_PORT_5__PMC1)
OSACA89, LLVM-MCAZnAGU0ZnAGU1
INFORAMTION_REQUIRED T_L3 + T_MEM, TODO
FLOPs per iteration | 0 |
read streams | 1 Streams with 8.00 B |
write streams | 1 Streams with 8.00 B |
read+write streams | 0 Streams with 0.00 B |
FLOPs per iteration | 2 |
read streams | 2 Streams with 16.00 B |
write streams | 1 Streams with 8.00 B |
read+write streams | 1 Streams with 8.00 B |
FLOPs per iteration | 0 |
read streams | 1 Streams with 8.00 B |
write streams | 0 Streams with 0.00 B |
read+write streams | 0 Streams with 0.00 B |
FLOPs per iteration | 2 |
read streams | 3 Streams with 24.00 B |
write streams | 1 Streams with 8.00 B |
read+write streams | 0 Streams with 0.00 B |
FLOPs per iteration | 0 |
read streams | 1 Streams with 8.00 B |
write streams | 1 Streams with 8.00 B |
read+write streams | 1 Streams with 8.00 B |