INSPECT

Intra Node Stencil Performance Evaluation Collection

AMD EPYC 7451 24-Core Processor

General

model type AMD K17 (Zen) architecture
model name AMD EPYC 7451 24-Core Processor
micro-architecture  
micro-architecture modeler  
cores per socket 24
cores per NUMA domain 6
cacheline size 64 B
clock 2.3 GHz
NUMA domains per socket 4

This machine file was generated for kerncraft version 0.8.6.dev0.

Compiler Flags

clang -O3 -march=znver1 -D_POSIX_C_SOURCE=200112L -fopenmp -ffreestanding
gcc -O3 -march=znver1 -fopenmp -ffreestanding
icc -O3 -xHost -fno-alias -qopenmp -ffreestanding -nolib-inline

Flops per Cycle

  ADD MUL FMA total
Single Precission 8 8 8 16
Double Precission 4 4 4 8

Memory Hierarchy

L1

groups 48
cores per group 1
threads per group 2
transfers overlap true

Cache Per Group

sets 128
ways 4
cl_size 64
replacement_policy LRU
write_allocate true
write_back true
load_from L2
store_to L2

Performance Counter Metrics

accesses DATA_CACHE_ACCESSES__PMC[0-3]
misses DATA_CACHE_MISSES__PMC[0-3]
evicts DATA_CACHE_WRITEBACKS__PMC[0-3]

L2

groups 48
cores per group 1
threads per group 2
transfers overlap true

Cache Per Group

sets 1024
ways 8
cl_size 64
replacement_policy LRU
write_allocate true
write_back true
load_from None
victims_to L3
store_to L3

Performance Counter Metrics

accesses INFORMATION_REQUIRED (e.g., L1D_REPLACEMENT__PMC0)
misses INFORMATION_REQUIRED (e.g., L2_LINES_IN_ALL__PMC1)
evicts INFORMATION_REQUIRED (e.g., L2_LINES_OUT_DIRTY_ALL__PMC2)

L3

groups 16
cores per group 3
threads per group 6
transfers overlap false

Cache Per Group

sets 8192
ways 16
cl_size 64
replacement_policy LRU
write_allocate false
write_back true

Performance Counter Metrics

accesses EVENT_L3_ACCESS__CMPC[0-5]
misses EVENT_L3_MISS__CMPC[0-5]
evicts INFORMATION_REQUIRED (e.g., L2_LINES_OUT_DIRTY_ALL__PMC2)

MEM

cores per group 24
threads per group 48
transfers overlap false

Overlapping Model

Ports:

OSACA01233DV4567, LLVM-MCAZnAGU0ZnAGU1ZnALU0ZnALU1ZnALU2ZnALU3ZnDividerZnFPU0ZnFPU1ZnFPU2ZnFPU3ZnMultiplier

Performance Counter Metric

INFORAMTION_REQUIRED Example:max(UOPS_DISPATCHED_PORT_PORT_0__PMC2, UOPS_DISPATCHED_PORT_PORT_1__PMC3, UOPS_DISPATCHED_PORT_PORT_4__PMC0, UOPS_DISPATCHED_PORT_PORT_5__PMC1)

Non-Overlapping Model

Ports:

OSACA89, LLVM-MCAZnAGU0ZnAGU1

Performance Counter Metric

INFORAMTION_REQUIRED T_L3 + T_MEM, TODO

Benchmarks

Kernels

copy

FLOPs per iteration 0
read streams 1 Streams with 8.00 B
write streams 1 Streams with 8.00 B
read+write streams 0 Streams with 0.00 B

daxpy

FLOPs per iteration 2
read streams 2 Streams with 16.00 B
write streams 1 Streams with 8.00 B
read+write streams 1 Streams with 8.00 B

load

FLOPs per iteration 0
read streams 1 Streams with 8.00 B
write streams 0 Streams with 0.00 B
read+write streams 0 Streams with 0.00 B

triad

FLOPs per iteration 2
read streams 3 Streams with 24.00 B
write streams 1 Streams with 8.00 B
read+write streams 0 Streams with 0.00 B

update

FLOPs per iteration 0
read streams 1 Streams with 8.00 B
write streams 1 Streams with 8.00 B
read+write streams 1 Streams with 8.00 B