Intra Node Stencil Performance Evaluation Collection
| model type | AMD K17 (Zen) architecture |
| model name | AMD EPYC 7451 24-Core Processor |
| micro-architecture | |
| micro-architecture modeler | |
| cores per socket | 24 |
| cores per NUMA domain | 6 |
| cacheline size | 64 B |
| clock | 2.3 GHz |
| NUMA domains per socket | 4 |
This machine file was generated for kerncraft version 0.8.6.dev0.
| clang | -O3 -march=znver1 -D_POSIX_C_SOURCE=200112L -fopenmp -ffreestanding |
| gcc | -O3 -march=znver1 -fopenmp -ffreestanding |
| icc | -O3 -xHost -fno-alias -qopenmp -ffreestanding -nolib-inline |
| ADD | MUL | FMA | total | |
| Single Precission | 8 | 8 | 8 | 16 |
| Double Precission | 4 | 4 | 4 | 8 |
| groups | 48 |
| cores per group | 1 |
| threads per group | 2 |
| transfers overlap | true |
| sets | 128 |
| ways | 4 |
| cl_size | 64 |
| replacement_policy | LRU |
| write_allocate | true |
| write_back | true |
| load_from | L2 |
| store_to | L2 |
| accesses | DATA_CACHE_ACCESSES__PMC[0-3] |
| misses | DATA_CACHE_MISSES__PMC[0-3] |
| evicts | DATA_CACHE_WRITEBACKS__PMC[0-3] |
| groups | 48 |
| cores per group | 1 |
| threads per group | 2 |
| transfers overlap | true |
| sets | 1024 |
| ways | 8 |
| cl_size | 64 |
| replacement_policy | LRU |
| write_allocate | true |
| write_back | true |
| load_from | None |
| victims_to | L3 |
| store_to | L3 |
| accesses | INFORMATION_REQUIRED (e.g., L1D_REPLACEMENT__PMC0) |
| misses | INFORMATION_REQUIRED (e.g., L2_LINES_IN_ALL__PMC1) |
| evicts | INFORMATION_REQUIRED (e.g., L2_LINES_OUT_DIRTY_ALL__PMC2) |
| groups | 16 |
| cores per group | 3 |
| threads per group | 6 |
| transfers overlap | false |
| sets | 8192 |
| ways | 16 |
| cl_size | 64 |
| replacement_policy | LRU |
| write_allocate | false |
| write_back | true |
| accesses | EVENT_L3_ACCESS__CMPC[0-5] |
| misses | EVENT_L3_MISS__CMPC[0-5] |
| evicts | INFORMATION_REQUIRED (e.g., L2_LINES_OUT_DIRTY_ALL__PMC2) |
| cores per group | 24 |
| threads per group | 48 |
| transfers overlap | false |
OSACA01233DV4567, LLVM-MCAZnAGU0ZnAGU1ZnALU0ZnALU1ZnALU2ZnALU3ZnDividerZnFPU0ZnFPU1ZnFPU2ZnFPU3ZnMultiplier
INFORAMTION_REQUIRED Example:max(UOPS_DISPATCHED_PORT_PORT_0__PMC2, UOPS_DISPATCHED_PORT_PORT_1__PMC3, UOPS_DISPATCHED_PORT_PORT_4__PMC0, UOPS_DISPATCHED_PORT_PORT_5__PMC1)
OSACA89, LLVM-MCAZnAGU0ZnAGU1
INFORAMTION_REQUIRED T_L3 + T_MEM, TODO
| FLOPs per iteration | 0 |
| read streams | 1 Streams with 8.00 B |
| write streams | 1 Streams with 8.00 B |
| read+write streams | 0 Streams with 0.00 B |
| FLOPs per iteration | 2 |
| read streams | 2 Streams with 16.00 B |
| write streams | 1 Streams with 8.00 B |
| read+write streams | 1 Streams with 8.00 B |
| FLOPs per iteration | 0 |
| read streams | 1 Streams with 8.00 B |
| write streams | 0 Streams with 0.00 B |
| read+write streams | 0 Streams with 0.00 B |
| FLOPs per iteration | 2 |
| read streams | 3 Streams with 24.00 B |
| write streams | 1 Streams with 8.00 B |
| read+write streams | 0 Streams with 0.00 B |
| FLOPs per iteration | 0 |
| read streams | 1 Streams with 8.00 B |
| write streams | 1 Streams with 8.00 B |
| read+write streams | 1 Streams with 8.00 B |