Intra Node Stencil Performance Evaluation Collection
#
model type | Cavium Thunder X2 (ARMv8) |
model name | |
micro-architecture | |
micro-architecture modeler | |
cores per socket | 32 |
cores per NUMA domain | 32 |
cacheline size | 64 B |
clock | 2.2 GHz |
NUMA domains per socket | 1 |
This machine file was generated for kerncraft version 0.8.6.dev0.
clang | -O3 -target aarch64-unknown-linux-gnu -D_POSIX_C_SOURCE=200112L -fopenmp -ffreestanding |
gcc | -O3 -march=armv8.1-a -fopenmp -ffreestanding |
ADD | MUL | FMA | total | |
Single Precission | INFORMATION_REQUIRED | INFORMATION_REQUIRED | INFORMATION_REQUIRED | INFORMATION_REQUIRED |
Double Precission | INFORMATION_REQUIRED | INFORMATION_REQUIRED | INFORMATION_REQUIRED | INFORMATION_REQUIRED |
groups | 64 |
cores per group | 1 |
threads per group | 4 |
transfers overlap | false |
sets | 64 |
ways | 8 |
cl_size | 64 B |
replacement_policy | LRU |
write_allocate | true |
write_back | true |
load_from | L2 |
store_to | L2 |
accesses | INFORMATION_REQUIRED (e.g., L1D_REPLACEMENT__PMC0) |
misses | INFORMATION_REQUIRED (e.g., L2_LINES_IN_ALL__PMC1) |
evicts | INFORMATION_REQUIRED (e.g., L2_LINES_OUT_DIRTY_ALL__PMC2) |
groups | 64 |
cores per group | 1 |
threads per group | 4 |
transfers overlap | false |
sets | 512 |
ways | 8 |
cl_size | 64 B |
replacement_policy | LRU |
write_allocate | true |
write_back | true |
load_from | None |
store_to | L3 |
victims_to | L3 |
accesses | INFORMATION_REQUIRED (e.g., L1D_REPLACEMENT__PMC0) |
misses | INFORMATION_REQUIRED (e.g., L2_LINES_IN_ALL__PMC1) |
evicts | INFORMATION_REQUIRED (e.g., L2_LINES_OUT_DIRTY_ALL__PMC2) |
groups | 2 |
cores per group | 32 |
threads per group | 128 |
transfers overlap | false |
sets | 65536 |
ways | 8 |
cl_size | 64 B |
replacement_policy | LRU |
write_allocate | false |
write_back | true |
accesses | INFORMATION_REQUIRED (e.g., L1D_REPLACEMENT__PMC0) |
misses | INFORMATION_REQUIRED (e.g., L2_LINES_IN_ALL__PMC1) |
evicts | INFORMATION_REQUIRED (e.g., L2_LINES_OUT_DIRTY_ALL__PMC2) |
cores per group | 32 |
threads per group | 128 |
transfers overlap | false |
OSACA345
INFORMATION_REQUIRED Example:max(UOPS_DISPATCHED_PORT_PORT_0__PMC2, UOPS_DISPATCHED_PORT_PORT_1__PMC3, UOPS_DISPATCHED_PORT_PORT_4__PMC0, UOPS_DISPATCHED_PORT_PORT_5__PMC1)
OSACA00DV11DV2345
INFORMATION_REQUIRED Example:max(UOPS_DISPATCHED_PORT_PORT_0__PMC2, UOPS_DISPATCHED_PORT_PORT_1__PMC3, UOPS_DISPATCHED_PORT_PORT_4__PMC0, UOPS_DISPATCHED_PORT_PORT_5__PMC1)
FLOPs per iteration | 0 |
read streams | 1 Streams with 8.00 B |
write streams | 1 Streams with 8.00 B |
read+write streams | 0 Streams with 0.00 B |
FLOPs per iteration | 2 |
read streams | 2 Streams with 16.00 B |
write streams | 1 Streams with 8.00 B |
read+write streams | 1 Streams with 8.00 B |
FLOPs per iteration | 0 |
read streams | 1 Streams with 8.00 B |
write streams | 0 Streams with 0.00 B |
read+write streams | 0 Streams with 0.00 B |
FLOPs per iteration | 2 |
read streams | 3 Streams with 24.00 B |
write streams | 1 Streams with 8.00 B |
read+write streams | 0 Streams with 0.00 B |
FLOPs per iteration | 0 |
read streams | 1 Streams with 8.00 B |
write streams | 1 Streams with 8.00 B |
read+write streams | 1 Streams with 8.00 B |