INSPECT

Julian Hornich says:

Level two (L2) cache bandwidth is optimistic, may be up to 64 B/cy as stated by Intel. But in practice this value is rarely reached.

General

model type	Intel Xeon Haswell EN/EP/EX processor
model name	Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz
micro-architecture
micro-architecture modeler
cores per socket	14
cores per NUMA domain	7
cacheline size	64 B
clock	2.3 GHz
NUMA domains per socket	2

This machine file was generated for kerncraft version 0.8.6.dev0.

Compiler Flags

icc	`-O3 -xCORE-AVX2 -fno-alias -qopenmp -ffreestanding -nolib-inline`
clang	`-O3 -mavx2 -D_POSIX_C_SOURCE=200809L -fopenmp -ffreestanding`
gcc	`-O3 -march=core-avx2 -D_POSIX_C_SOURCE=200809L -fopenmp -lm -ffreestanding`

Flops per Cycle

	ADD	MUL	FMA	total
Single Precission	8	8	16	32
Double Precission	4	4	8	16

Memory Hierarchy

L1

groups	28
cores per group	1
threads per group	2
transfers overlap	false

Cache Per Group

sets	64
ways	8
cl_size	64
replacement_policy	LRU
write_allocate	true
write_back	true
load_from	L2
store_to	L2

Performance Counter Metrics

accesses	`MEM_UOPS_RETIRED_LOADS:PMC[0-3] + MEM_UOPS_RETIRED_STORES:PMC[0-3]`
misses	`L1D_REPLACEMENT:PMC[0-3]`
evicts	`L1D_M_EVICT:PMC[0-3]`

L2

groups	28
cores per group	1
threads per group	2
transfers overlap	false

Cache Per Group

sets	512
ways	8
cl_size	64
replacement_policy	LRU
write_allocate	true
write_back	true
load_from	L3
store_to	L3

Performance Counter Metrics

accesses	`L1D_REPLACEMENT:PMC[0-3] + L1D_M_EVICT:PMC[0-3]`
misses	`L2_LINES_IN_ALL:PMC[0-3]`
evicts	`L2_TRANS_L2_WB:PMC[0-3]`

L3

groups	4
cores per group	7
threads per group	14
transfers overlap	false

Cache Per Group

sets	9216
ways	16
cl_size	64
replacement_policy	LRU
write_allocate	true
write_back	true

Performance Counter Metrics

accesses	`L2_LINES_IN_ALL:PMC[0-3] + L2_TRANS_L2_WB:PMC[0-3]`
misses	`(CAS_COUNT_RD:MBOX0C[01] + CAS_COUNT_RD:MBOX1C[01] + CAS_COUNT_RD:MBOX2C[01] + CAS_COUNT_RD:MBOX3C[01] + CAS_COUNT_RD:MBOX4C[01] + CAS_COUNT_RD:MBOX5C[01] + CAS_COUNT_RD:MBOX6C[01] + CAS_COUNT_RD:MBOX7C[01])`
evicts	`(CAS_COUNT_WR:MBOX0C[01] + CAS_COUNT_WR:MBOX1C[01] + CAS_COUNT_WR:MBOX2C[01] + CAS_COUNT_WR:MBOX3C[01] + CAS_COUNT_WR:MBOX4C[01] + CAS_COUNT_WR:MBOX5C[01] + CAS_COUNT_WR:MBOX6C[01] + CAS_COUNT_WR:MBOX7C[01])`

MEM

cores per group	14
threads per group	28
transfers overlap	false

Overlapping Model

Ports:

IACA00DV1234567, OSACA00DV1234567, LLVM-MCAHWDividerHWFPDividerHWPort0HWPort1HWPort2HWPort3HWPort4HWPort5HWPort6HWPort7

Performance Counter Metric

Max(UOPS_EXECUTED_PORT_PORT_0:PMC[0-3], UOPS_EXECUTED_PORT_PORT_1:PMC[0-3], UOPS_EXECUTED_PORT_PORT_4:PMC[0-3], UOPS_EXECUTED_PORT_PORT_5:PMC[0-3], UOPS_EXECUTED_PORT_PORT_6:PMC[0-3], UOPS_EXECUTED_PORT_PORT_7:PMC[0-3])

Non-Overlapping Model

Ports:

IACA2D3D, OSACA2D3D, LLVM-MCAHWPort2HWPort3

Performance Counter Metric

T_nOL + T_L1L2 + T_L2L3 + T_L3MEM

Benchmarks

Kernels

copy

FLOPs per iteration	0
read streams	1 Streams with 8.00 B
write streams	1 Streams with 8.00 B
read+write streams	0 Streams with 0.00 B

daxpy

FLOPs per iteration	2
read streams	2 Streams with 16.00 B
write streams	1 Streams with 8.00 B
read+write streams	1 Streams with 8.00 B

load

FLOPs per iteration	0
read streams	1 Streams with 8.00 B
write streams	0 Streams with 0.00 B
read+write streams	0 Streams with 0.00 B

triad

FLOPs per iteration	2
read streams	3 Streams with 24.00 B
write streams	1 Streams with 8.00 B
read+write streams	0 Streams with 0.00 B

update

FLOPs per iteration	0
read streams	1 Streams with 8.00 B
write streams	1 Streams with 8.00 B
read+write streams	1 Streams with 8.00 B

Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz

General

Compiler Flags

Flops per Cycle

Memory Hierarchy

L1

Cache Per Group

Performance Counter Metrics

L2

Cache Per Group

Performance Counter Metrics

L3

Cache Per Group

Performance Counter Metrics

MEM

Overlapping Model

Ports:

Performance Counter Metric

Non-Overlapping Model

Ports:

Performance Counter Metric

Benchmarks

Kernels

copy

daxpy

load

triad

update