LIKWID
Intel® Skylake SP

This page is valid for Skylake SP. The Skylake SP microarchitecture supports the UBOX and the CBOX Uncore devices.

Available performance monitors for the Intel® Skylake SP microarchitecture

Counters available for each hardware thread

Fixed-purpose counters

Since the Core2 microarchitecture, Intel® provides a set of fixed-purpose counters. Each can measure only one specific event.

Counter and events

Counter name Event name
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF

Available Options

Option Argument Description Comment
anythread N Set bit 2+(index*4) in config register
kernel N Set bit (index*4) in config register

General-purpose counters

Commonly the Intel® Skylake SP microarchitecture provides 4 general-purpose counters consisting of a config and a counter register.

Counter and events

Counter name Event name
PMC0 *
PMC1 *
PMC2 *
PMC3 *

If HyperThreading is disabled, you can additionally use the PMC registers of the disabled SMT thread and thus have 8 PMC registers

Available Options

Option Argument Description Comment
edgedetect N Set bit 18 in config register
kernel N Set bit 17 in config register
anythread N Set bit 21 in config register
threshold 8 bit hex value Set bits 24-31 in config register
invert N Set bit 23 in config register
in_transaction N Set bit 32 in config register Only available if Intel® Transactional Synchronization Extensions are available
in_transaction_aborted N Set bit 33 in config register Only counter PMC2 and only if Intel® Transactional Synchronization Extensions are available

Special handling for events

The Intel® Skylake SP microarchitecture provides measureing of offcore events in PMC counters. Therefore the stream of offcore events must be filtered using the OFFCORE_RESPONSE registers. The Intel® Skylake SP microarchitecture has two of those registers. LIKWID defines some events that perform the filtering according to the event name. Although there are many bitmasks possible, LIKWID natively provides only the ones with response type ANY. Own filtering can be applied with the OFFCORE_RESPONSE_0_OPTIONS and OFFCORE_RESPONSE_1_OPTIONS events. Only for those events two more counter options are available:

Option Argument Description Comment
match0 16 bit hex value Input value masked with 0x8FFF and written to bits 0-15 in the OFFCORE_RESPONSE register Check the Intel® Software Developer System Programming Manual, Vol. 3, Chapter Performance Monitoring and https://download.01.org/perfmon/SKX.
match1 22 bit hex value Input value is written to bits 16-37 in the OFFCORE_RESPONSE register Check the Intel® Software Developer System Programming Manual, Vol. 3, Chapter Performance Monitoring and https://download.01.org/perfmon/SKX.

The event MEM_TRANS_RETIRED_LOAD_LATENCY is not available because it needs programming of PEBS registers. PEBS is a kernel-level measurement facility for performance monitoring. Although we can program it from user-space, the results are always 0.

Thermal counter

The Intel® Skylake SP microarchitecture provides one register for the current core temperature.

Counter and events

Counter name Event name
TMP0 TEMP_CORE

Counters available for one hardware thread per socket

Power counter

The Intel® Skylake SP microarchitecture provides measurements of the current power consumption through the RAPL interface.

Counter and events

Counter name Event name
PWR0 PWR_PKG_ENERGY
PWR1 PWR_PP0_ENERGY
PWR2 PWR_PP1_ENERGY
PWR3 PWR_DRAM_ENERGY
PWR4 PWR_SYS_ENERGY

Uncore global counters

The Intel® Skylake SP microarchitecture provides measurements for the global uncore.

Counter and events

Counter name Event name
UBOX0 *
UBOX1 *
UBOXFIX UNCORE_CLOCK

Available Options

Option Argument Description Comment
edgedetect N Set bit 18 in config register
threshold 8 bit hex value Set bits 24-31 in config register
invert N Set bit 23 in config register

Last level cache counters

The Intel® Skylake SP microarchitecture provides measurements for the last level cache segments.

Counter and events

Counter name Event name
CBOX<0-27>C0 *
CBOX<0-27>C1 *
CBOX<0-27>C2 *
CBOX<0-27>C3 *

Available Options

Option Argument Description Comment
edgedetect N Set bit 18 in config register
threshold 8 bit hex value Set bits 24-28 in config register
invert N Set bit 23 in config register
tid 8 bit hex value Set bits 0-7 in MSR_UNC_C<0-27>_PMON_BOX_FILTER register and bit 19 in config register
state 10 bit hex value Set bits 17-27 in MSR_UNC_C<0-27>_PMON_BOX_FILTER register LLC F: 0x80, LLC M: 0x40, LLC E: 0x20, LLC S: 0x10, SF H: 0x08, SF E: 0x04, SF S: 0x02, LLC I: 0x01
opcode 20 bit hex value Set bits 9-28 and set bits 17,18,27,28 in MSR_UNC_C<0-27>_PMON_BOX_FILTER1 register A list of valid opcodes can be found in the Intel® Xeon SP (v6) Uncore Manual.
match0 2 bit hex address Set bits 30-31 in MSR_UNC_C<0-27>_PMON_BOX_FILTER1 register See the Intel® Xeon SP (v6) Uncore Manual for more information.
match1 6 bit hex address Set bits 0,1,4,5 in MSR_UNC_C<0-27>_PMON_BOX_FILTER1 register See the Intel® Xeon SP (v6) Uncore Manual for more information.

Memory controller counters

The Intel® Skylake SP microarchitecture provides measurements of the integrated Memory Controllers (iMC) in the Uncore. The description from Intel®:
Intel® Xeon® Processor Scalable Memory Family integrated Memory Controller provides the interface to DRAM and communicates to the rest of the Uncore through the Mesh2Mem block.
The memory controller also provides a variety of RAS features, such as ECC, lockstep, memory access retry, memory scrubbing, thermal throttling, mirroring, and rank sparing.

The integrated Memory Controllers performance counters are exposed to the operating system through PCI interfaces. There may be two memory controllers in the system. There are 3 different PCI devices per memory controller, each handling one memory channels. Each channel has 4 different general-purpose counters and one fixed counter for the DRAM clock. The channels of the first memory controller are MBOX0-2, the four channels of the second memory controller (if available) are named MBOX3-5. The name MBOX originates from the Nehalem EX Uncore monitoring where those functional units are called MBOX.

Counter and events

Counter name Event name
MBOX<0-5>C0 *
MBOX<0-5>C1 *
MBOX<0-5>C2 *
MBOX<0-5>C3 *
MBOX<0-5>FIX DRAM_CLOCKTICKS

Available Options (Only for counter MBOX<0-7>C<0-3>)

Option Argument Operation Comment
edgedetect N Set bit 18 in config register
invert N Set bit 23 in config register
threshold 8 bit hex value Set bits 24-31 in config register

Power control unit counters

The Intel® Skylake SP microarchitecture provides measurements of the power control unit (PCU) in the Uncore. The description from Intel®:
The PCU is the primary Power Controller for the Intel® Xeon® Processor Scalable Memory Family die, responsible for distributing power to core/uncore components and thermal management. It runs in firmware on an internal micro-controller and coordinates the socket’s power states.
The PCU performance counters are exposed to the operating system through the MSR interface. The name WBOX originates from the Nehalem EX Uncore monitoring where those functional units are called WBOX.

Counter and events

Counter name Event name
WBOX0 *
WBOX1 *
WBOX2 *
WBOX3 *
WBOX0FIX CORES_IN_C3
WBOX1FIX CORES_IN_C6
WBOX2FIX CORES_IN_P3
WBOX3FIX CORES_IN_P6

Available Options (Only for WBOX<0-3> counters)

Option Argument Operation Comment
edgedetect N Set bit 18 in config register
invert N Set bit 23 in config register
threshold 5 bit hex value Set bits 24-28 in config register
occupancy 2 bit hex value Set bit 14-15 in config register Cores in C0: 0x1, in C3: 0x2, in C6: 0x3
occupancy_filter 32 bit hex value Set bits 0-31 in MSR_UNC_PCU_PMON_BOX_FILTER register Band0: bits 0-7, Band1: bits 8-15, Band2: bits 16-23, Band3: bits 24-31
occupancy_edgedetect N Set bit 31 in config register
occupancy_invert N Set bit 30 in config register

UPI interface counters

The Intel® Skylake SP microarchitecture provides measurements of the Ultra Path Interconnect Link layer (UPI) in the Uncore. The description from Intel®:
Intel® Xeon® Processor Scalable Memory Family uses a new coherent interconnect for scaling to multiple sockets known as Intel® Ultra Path Interconnect (Intel UPI). Intel® UPI technology provides a cache coherent socket to socket external communication interface between processors.
The UPI hardware performance counters are exposed to the operating system through PCI interfaces. There are three of those interfaces for the UPI. The actual amount of SBOX counters depend on the CPU core count of one socket. If your system has not all interfaces but interface 0 does not work, try the other ones. The SBOX was introduced for the Nehalem EX microarchitecture.

Counter and events

Counter name Event name
SBOX<0-2>C0 *
SBOX<0-2>C1 *
SBOX<0-2>C2 *
SBOX<0-2>C3 *

Available Options

Option Argument Description Comment
edgedetect N Set bit 18 in config register
threshold 8 bit hex value Set bits 24-31 in config register
invert N Set bit 23 in config register
nid 8 bit hex value Set bits 40-43 and bit 45 in config register
match0 8 bit hex value Set bits 32-39 in config register
nid 10 bit hex value Set bits 46-55 in config register

Ring-to-UPI counters

The Intel® Skylake SP microarchitecture provides measurements of the Mesh-to-UPI (M3UPI) interface in the Uncore. The description from Intel®:
M3UPI is the interface between the mesh and the Intel® UPI Link Layer. It is responsible for translating between mesh protocol packets and flits that are used for transmitting data across the Intel® UPI interface. It performs credit checking between the local Intel® UPI LL, the remote Intel® UPI LL and other agents on the local mesh.
The Mesh-to-UPI performance counters are exposed to the operating system through PCI interfaces. Since the RBOXes manage the traffic from the LLC-connecting mesh interface on the socket with the UPI interfaces (SBOXes), the amount is similar to the amount of SBOXes. See at SBOXes how many are available for which system configuration. The name RBOX originates from the Nehalem EX Uncore monitoring where those functional units are called RBOX.

Counter and events

Counter name Event name
RBOX<0,1,2>C0 *
RBOX<0,1,2>C1 *
RBOX<0,1,2>C2 *

Available Options

Option Argument Operation Comment
edgedetect N Set bit 18 in config register
invert N Set bit 23 in config register
threshold 8 bit hex value Set bits 24-31 in config register

Mesh2Memory counters

The Intel® Skylake SP microarchitecture provides measurements of the mesh (M2M) which connects the cores with the Uncore devices. The description from Intel®:
M2M blocks manage the interface between the Mesh (operating on both Mesh and the SMI3 protocol) and the Memory Controllers. M2M acts as intermediary between the local CHA issuing memory transactions to its attached Memory Controller. Commands from M2M to the MC are serialized by a scheduler and only one can cross the interface at a time.
The M2M devices is first introduced in the Intel® Skylake SP microarchitecture. There was no suitable unit name for this, so LIKWID calls them simply M2M.

Counter and events

Counter name Event name
M2M<0,1>C0 *
M2M<0,1>C1 *
M2M<0,1>C2 *
M2M<0,1>C3 *

Available Options

Option Argument Operation Comment
edgedetect N Set bit 18 in config register
invert N Set bit 23 in config register
threshold 8 bit hex value Set bits 24-31 in config register

IIO box counters (general-purpose)

The Intel® Skylake SP microarchitecture provides measurements of the IIO box in the Uncore. The description from Intel®:
IIO stacks are responsible for managing traffic between the PCIe domain and the Mesh domain. The IIO PMON block is situated near the IIO stack’s traffic controller capturing traffic controller as well as PCIe root port information. The traffic controller is responsible for translating traffic coming in from the Mesh (through M2PCIe) and processed by IRP into the PCIe domain to IO agents such as CBDMA, PCIe and MCP. The IIO box counters are exposed to the operating system through the MSR interface. The IBOX was introduced with the Intel® IvyBridge EP/EN/EX microarchitecture.

Box description

Unit number Unit description
0 CBDMA
1 PCIe0
2 PCIe1
3 PCIe2
4 MCP0
5 MCP1

Counter and events

Counter name Event name
IBOX<0-5>C0 *
IBOX<0-5>C1 *
IBOX<0-5>C2 *
IBOX<0-5>C3 *
IBOX<0-5>CLK *

Available Options (only for counters IBOX<0-5>C3<0-3>)

Option Argument Operation Comment
edgedetect N Set bit 18 in config register
invert N Set bit 23 in config register
threshold 12 bit hex value Set bits 24-35 in config register
mask0 8 bit hex value Set bits 36-43 in config register
mask1 3 bit hex value Set bits 44-46 in config register

IIO box counters (fixed-purpose)

The Intel® Skylake SP microarchitecture provides measurements of the IIO box in the Uncore. Besides the general-purpose counters of SKX_IBOXGEN , there are fixed-purpose counters

Box description

Unit number Unit description
0 CBDMA
1 PCIe0
2 PCIe1
3 PCIe2
4 MCP0
5 MCP1

Counter and events

Counter name Event name
IBAND<0-5>PI0 BANDWIDTH_PORT0_IN
IBAND<0-5>PI1 BANDWIDTH_PORT1_IN
IBAND<0-5>PI2 BANDWIDTH_PORT2_IN
IBAND<0-5>PI3 BANDWIDTH_PORT3_IN
IBAND<0-5>PO0 BANDWIDTH_PORT0_OUT
IBAND<0-5>PO1 BANDWIDTH_PORT1_OUT
IBAND<0-5>PO2 BANDWIDTH_PORT2_OUT
IBAND<0-5>PO3 BANDWIDTH_PORT3_OUT
IUTIL<0-5>PI0 UTLILIZATION_PORT0_IN
IUTIL<0-5>PI1 UTLILIZATION_PORT1_IN
IUTIL<0-5>PI2 UTLILIZATION_PORT2_IN
IUTIL<0-5>PI3 UTLILIZATION_PORT3_IN
IUTIL<0-5>PO0 UTLILIZATION_PORT0_OUT
IUTIL<0-5>PO1 UTLILIZATION_PORT1_OUT
IUTIL<0-5>PO2 UTLILIZATION_PORT2_OUT
IUTIL<0-5>PO3 UTLILIZATION_PORT3_OUT

IRP box counters

The Intel® Skylake SP microarchitecture provides measurements of the IRP box in the Uncore. The description from Intel®:
IRP is responsible for maintaining coherency for IIO traffic targeting coherent memory. The IRP box counters are exposed to the operating system through the MSR interface. The IRP was introduced with the Intel® IvyBridge EP/EN/EX microarchitecture.

Box description

Unit number Unit description
0 CBDMA
1 PCIe0
2 PCIe1
3 PCIe2
4 MCP0
5 MCP1

Counter and events

Counter name Event name
IRP<0-5>C0 *
IRP<0-5>C1 *

Available Options

Option Argument Operation Comment
edgedetect N Set bit 18 in config register
invert N Set bit 23 in config register
threshold 12 bit hex value Set bits 24-35 in config register

*/