INSPECT

Intra Node Stencil Performance Evaluation Collection

#

General

model type Cavium Thunder X2 (ARMv8)
model name  
micro-architecture  
micro-architecture modeler  
cores per socket 32
cores per NUMA domain 32
cacheline size 64 B
clock 2.2 GHz
NUMA domains per socket 1

This machine file was generated for kerncraft version 0.8.6.dev0.

Compiler Flags

clang -O3 -target aarch64-unknown-linux-gnu -D_POSIX_C_SOURCE=200112L -fopenmp -ffreestanding
gcc -O3 -march=armv8.1-a -fopenmp -ffreestanding

Flops per Cycle

  ADD MUL FMA total
Single Precission INFORMATION_REQUIRED INFORMATION_REQUIRED INFORMATION_REQUIRED INFORMATION_REQUIRED
Double Precission INFORMATION_REQUIRED INFORMATION_REQUIRED INFORMATION_REQUIRED INFORMATION_REQUIRED

Memory Hierarchy

L1

groups 64
cores per group 1
threads per group 4
transfers overlap false

Cache Per Group

sets 64
ways 8
cl_size 64 B
replacement_policy LRU
write_allocate true
write_back true
load_from L2
store_to L2

Performance Counter Metrics

accesses INFORMATION_REQUIRED (e.g., L1D_REPLACEMENT__PMC0)
misses INFORMATION_REQUIRED (e.g., L2_LINES_IN_ALL__PMC1)
evicts INFORMATION_REQUIRED (e.g., L2_LINES_OUT_DIRTY_ALL__PMC2)

L2

groups 64
cores per group 1
threads per group 4
transfers overlap false

Cache Per Group

sets 512
ways 8
cl_size 64 B
replacement_policy LRU
write_allocate true
write_back true
load_from None
store_to L3
victims_to L3

Performance Counter Metrics

accesses INFORMATION_REQUIRED (e.g., L1D_REPLACEMENT__PMC0)
misses INFORMATION_REQUIRED (e.g., L2_LINES_IN_ALL__PMC1)
evicts INFORMATION_REQUIRED (e.g., L2_LINES_OUT_DIRTY_ALL__PMC2)

L3

groups 2
cores per group 32
threads per group 128
transfers overlap false

Cache Per Group

sets 65536
ways 8
cl_size 64 B
replacement_policy LRU
write_allocate false
write_back true

Performance Counter Metrics

accesses INFORMATION_REQUIRED (e.g., L1D_REPLACEMENT__PMC0)
misses INFORMATION_REQUIRED (e.g., L2_LINES_IN_ALL__PMC1)
evicts INFORMATION_REQUIRED (e.g., L2_LINES_OUT_DIRTY_ALL__PMC2)

MEM

cores per group 32
threads per group 128
transfers overlap false

Overlapping Model

Ports:

OSACA345

Performance Counter Metric

INFORMATION_REQUIRED Example:max(UOPS_DISPATCHED_PORT_PORT_0__PMC2, UOPS_DISPATCHED_PORT_PORT_1__PMC3,    UOPS_DISPATCHED_PORT_PORT_4__PMC0, UOPS_DISPATCHED_PORT_PORT_5__PMC1)

Non-Overlapping Model

Ports:

OSACA00DV11DV2345

Performance Counter Metric

INFORMATION_REQUIRED Example:max(UOPS_DISPATCHED_PORT_PORT_0__PMC2, UOPS_DISPATCHED_PORT_PORT_1__PMC3,    UOPS_DISPATCHED_PORT_PORT_4__PMC0, UOPS_DISPATCHED_PORT_PORT_5__PMC1)

Benchmarks

Kernels

copy

FLOPs per iteration 0
read streams 1 Streams with 8.00 B
write streams 1 Streams with 8.00 B
read+write streams 0 Streams with 0.00 B

daxpy

FLOPs per iteration 2
read streams 2 Streams with 16.00 B
write streams 1 Streams with 8.00 B
read+write streams 1 Streams with 8.00 B

load

FLOPs per iteration 0
read streams 1 Streams with 8.00 B
write streams 0 Streams with 0.00 B
read+write streams 0 Streams with 0.00 B

triad

FLOPs per iteration 2
read streams 3 Streams with 24.00 B
write streams 1 Streams with 8.00 B
read+write streams 0 Streams with 0.00 B

update

FLOPs per iteration 0
read streams 1 Streams with 8.00 B
write streams 1 Streams with 8.00 B
read+write streams 1 Streams with 8.00 B