INSPECT

General

model type	Intel Core IvyBridge EP processor
model name	Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz
micro-architecture	IVB
micro-architecture modeler
cores per socket	10
cores per NUMA domain	10
cacheline size	64 B
clock	3.0 GHz
NUMA domains per socket	1

This machine file was generated for kerncraft version 0.7.1.

Compiler Flags

icc	`-O3 -xAVX -fno-alias -qopenmp -ffreestanding -nolib-inline`
clang	`-O3 -mavx -D_POSIX_C_SOURCE=200112L -fopenmp -ffreestanding`
gcc	`-O3 -march=corei7-avx -D_POSIX_C_SOURCE=200112L -fopenmp -lm -ffreestanding`

Flops per Cycle

	ADD	MUL	FMA	total
Single Precission	8	8		16
Double Precission	4	4		8

Memory Hierarchy

L1

groups	16
cores per group	1
threads per group	2
transfers overlap

Cache Per Group

sets	64
ways	8
cl_size	64
replacement_policy	LRU
write_allocate	true
write_back	true
load_from	L2
store_to	L2

Performance Counter Metrics

accesses	`MEM_UOPS_RETIRED_LOADS:PMC[0-3]`
misses	`L1D_REPLACEMENT:PMC[0-3]`
evicts	`L1D_M_EVICT:PMC[0-3]`

L2

groups	16
cores per group	1
threads per group	2
transfers overlap
non-overlap upstream throughput	32 B/cy, half-duplex

Cache Per Group

sets	512
ways	8
cl_size	64
replacement_policy	LRU
write_allocate	true
write_back	true
load_from	L3
store_to	L3

Performance Counter Metrics

accesses	`L1D_REPLACEMENT:PMC[0-3]`
misses	`L2_LINES_IN_ALL:PMC[0-3]`
evicts	`L2_TRANS_L2_WB:PMC[0-3]`

L3

groups	2
cores per group	8
threads per group	16
transfers overlap
non-overlap upstream throughput	32 B/cy, half-duplex

Cache Per Group

sets	25600
ways	16
cl_size	64
replacement_policy	LRU
write_allocate	true
write_back	true

Performance Counter Metrics

accesses	`L2_LINES_IN_ALL:PMC[0-3]`
misses	`(CAS_COUNT_RD:MBOX0C[01] + CAS_COUNT_RD:MBOX1C[01] + CAS_COUNT_RD:MBOX2C[01] + CAS_COUNT_RD:MBOX3C[01] + CAS_COUNT_RD:MBOX4C[01] + CAS_COUNT_RD:MBOX5C[01] + CAS_COUNT_RD:MBOX6C[01] + CAS_COUNT_RD:MBOX7C[01])`
evicts	`(CAS_COUNT_WR:MBOX0C[01] + CAS_COUNT_WR:MBOX1C[01] + CAS_COUNT_WR:MBOX2C[01] + CAS_COUNT_WR:MBOX3C[01] + CAS_COUNT_WR:MBOX4C[01] + CAS_COUNT_WR:MBOX5C[01] + CAS_COUNT_WR:MBOX6C[01] + CAS_COUNT_WR:MBOX7C[01])`

MEM

cores per group	8
threads per group	16
transfers overlap
non-overlap upstream throughput	full socket memory bandwidth, half-duplex

Overlapping Model

Ports:

0, 0DV, 1, 2, 3, 4, 5

Performance Counter Metric

Max(UOPS_DISPATCHED_PORT_PORT_0:PMC[0-3], UOPS_DISPATCHED_PORT_PORT_1:PMC[0-3], UOPS_DISPATCHED_PORT_PORT_4:PMC[0-3], UOPS_DISPATCHED_PORT_PORT_5:PMC[0-3])

Non-Overlapping Model

Ports:

2D, 3D

Performance Counter Metric

T_OL + T_L1L2 + T_L2L3 + T_L3MEM

Benchmarks

Kernels

copy

FLOPs per iteration	0
read streams	1 Streams with 8.00 B
write streams	1 Streams with 8.00 B
read+write streams	0 Streams with 0.00 B

daxpy

FLOPs per iteration	2
read streams	2 Streams with 16.00 B
write streams	1 Streams with 8.00 B
read+write streams	1 Streams with 8.00 B

load

FLOPs per iteration	0
read streams	1 Streams with 8.00 B
write streams	0 Streams with 0.00 B
read+write streams	0 Streams with 0.00 B

triad

FLOPs per iteration	2
read streams	3 Streams with 24.00 B
write streams	1 Streams with 8.00 B
read+write streams	0 Streams with 0.00 B

update

FLOPs per iteration	0
read streams	1 Streams with 8.00 B
write streams	1 Streams with 8.00 B
read+write streams	1 Streams with 8.00 B

Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz

General

Compiler Flags

Flops per Cycle

Memory Hierarchy

L1

Cache Per Group

Performance Counter Metrics

L2

Cache Per Group

Performance Counter Metrics

L3

Cache Per Group

Performance Counter Metrics

MEM

Overlapping Model

Ports:

Performance Counter Metric

Non-Overlapping Model

Ports:

Performance Counter Metric

Benchmarks

Kernels

copy

daxpy

load

triad

update