LIKWID
likwid-pin

Information

likwid-pin is a command line application to pin a sequential or multithreaded application to dedicated processors. It can be used as replacement for taskset. Opposite to taskset no affinity mask but single processors are specified. For multithreaded applications based on the pthreads library the pthread_create library call is overloaded through LD_PRELOAD and each created thread is pinned to a dedicated processor as specified in the pinning list. Per default every generated thread is pinned to the core in the order of calls to pthread_create. It is possible to skip single threads.

For OpenMP implementations, GCC and ICC compilers are explicitly supported. Clang's OpenMP backend should also work as it is built on top of Intel's OpenMP runtime library. Others may also work.

likwid-pin sets the environment variable OMP_NUM_THREADS for you if not already present. It will set as many threads as present in the pin expression. Be aware that with pthreads the parent thread is always pinned. If you create for example 4 threads with pthread_create and do not use the parent process as worker you still have to provide num_threads + 1 processor ids.

likwid-pin supports different numberings for pinning. Per default physical numbering of the cores is used. This is the numbering also likwid-topology reports. But also logical numbering inside the node or the sockets can be used. For details look at CPU_expressions.

For applications where first touch policy on NUMA systems cannot be employed likwid-pin can be used to turn on interleave memory placement. This can significantly speed up the performance of memory bound multi threaded codes. All NUMA nodes the user pinned threads to are used for interleaving.

Options

Option Description
-h, –help Print help message
-v, –version Print version information
-V, –verbose <level> Verbose output during execution for debugging. Possible values for <level>:
0 Output only errors
1 Output some information
2 Output detailed information
3 Output developer information
-c <arg> Define the CPUs that the application should be pinned on. LIKWID provides an intuitive and feature-rich syntax for CPU expressions.
See section CPU_expressions for details.
-S, –sweep Sweep memory and clean LLC of NUMA domains used by the given CPU expression
-i Activate interleaved memory policy for NUMA domains used by the given CPU expression
-p Print the thread affinity domains. If -c is set on the commandline, the affinity domains filled only with the given CPUs are printed.
-q, –quiet Don't print infos of the pinning process
-s, –skip <arg> 'arg' must be a bitmask in hex. Threads with the ID equal to a set bit in bitmask will be skipped during pinning
Example: 0x1 = Thread 0 is skipped.
-d <delim> Set the delimiter for the output of -p. Default is ','

Affinity Domains

While gathering the system topology, LIKWID groups the CPUs into so-called thread affinity domains. A thread affinity domain is a group of CPU IDs that are related to some kind of central entity of the system. The most common domain is the node domain (N) that contains all CPUs available in the system. Other domains group the CPUs according to socket, LLC or NUMA node relation. likwid-pin prints out all available affinity domains with the commandline option -p.The following list introduces all affinity domains with the used domain names:

Domain name Description
N Includes all CPUs in the system
S<number> Includes all CPUs that reside on CPU socket x
C<number> Includes all CPUs that share the same LLC with ID <number>.
This domain often contains the same CPUs as the S<number> domain because many CPU socket have a LLC shared by all CPUs of the socket
M<number> Includes all CPUs that are attached to the same NUMA memory domain

CPU expressions

One outstanding feature of LIKWID are the CPU expressions which are resolved to the CPUs in the actual system. There are multiple formats that can be chosen where each offers a convenient way to select the desired CPUs for execution or measurement. The CPU expressions are used for likwid-pin as well as likwid-perfctr. This section introduces the 4 formats and gives examples.

Physical numbering:

The first and probably most natural way of defining a list of CPUs is the usage of the physical numbering, similar to the numbering of the operating system and the IDs printed by likwid-topology. The desired CPU IDs can be set as comma-separated list, as range or a combination of both.

Logical numbering:

Besides the enumeration of physical CPU IDs, LIKWID supports the logical numbering inside of an affinity domain. For logical selection, the indicies inside of the desired affinity domain has to be given on the commandline. The logical numbering can be selected by prefixing the cpu expression with L:. The format is L:<indices> assuming affinity domain N or L:<affinity domain>:<indices>. Moreover, it is automatically activated if working inside of a CPU set (e.g. cgroups). For the examples we assume that the node affinity domain contains the CPUs 0,4,1,5,2,6,3,7. For the logical numbering, the list is sorted that the physical cores are listed first, hence the logical indices refer to 0,1,2,3,4,5,6,7:

Numbering by expression:

The most powerful format is probably the expression format. The format combines the input values for a selection function in a convenient way. In order to activate the expression format, the CPU string must be prefixed with E:. The basic format is E:<affinity domain>:<numberOfThreads> which selects simply the given <numberOfThreads> in the supplied <affinity domain>. The extended format is E:<affinity domain>:<numberOfThreads>:<chunksize>:<stride> and it selects the given <numberOfThreads> in the supplied <affinity domain> but takes <chunksize> threads in row with a distance of <stride>. For the examples we assume that the node affinity domain looks like this: 0,4,1,5,2,6,3,7:

Scatter expression:

The scatter expression distributes the threads evenly over the desired affinity domains. In contrast to the previous selection methods, the scatter expression schedules threads over multiple affinity domains. Although you can also select N as scatter domain, the intended domains are S, C and M. The scattering selects physical cores first. For the examples we assume that the socket affinity domain looks like this: S0 = 0,4,1,5 and S1 = 2,6,3,7, hence 8 hardware threads on a system with 2 SMT threads per CPU core.

*/