![]() |
LIKWID
|
likwid-bench
likwid-bench
is a benchmark suite for low-level (assembly) benchmarks to measure bandwidths and instruction throughput for specific instruction code on x86 systems. The currently included benchmark codes include common data access patterns like load and store but also calculations like vector triad and sum. likwid-bench
includes architecture specific benchmarks for x86, x86_64 and x86 for Intel Xeon Phi coprocessors. The performance values can either be calculated by likwid-bench
or measured using hardware performance counters by using likwid-perfctr
as a wrapper to likwid-bench
. This requires to build likwid-bench
with instrumentation enabled in config.mk (INSTRUMENT_BENCH
).
Option | Description | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
-h | Print help message | ||||||||||
-a | List all available benchmarks | ||||||||||
-p | List all available thread affinity domains | ||||||||||
-i <iters> | Use <iters> iterations of the benchmark kernel | ||||||||||
-d <delim> | Use <delim> instead of ',' for the output of -p | ||||||||||
-l <test> | List characteristics of <test> like number of streams, data used per loop iteration, ... | ||||||||||
-t <test> | Perform assembly benchmark <test> | ||||||||||
-s <min_time> | Minimal time in seconds to run the benchmark. Using this time, the iteration count is determined automatically to provide reliable results. Default is 1. If the determined iteration count is below 10, it is normalized to 10. | ||||||||||
-w <workgroup> | Set a workgroup for the benchmark. A workgroup can have different formats:
|
likwid-bench -t copy -w S0:100kB
copy
using all threads in affinity domain S0
. The input and output stream of the copy
benchmark sum up to 100kB
placed in affinity domain S0
. The iteration count is calculated automatically. likwid-bench -t triad -i 100 -w S0:1GB:2:1:2
triad
using 2
threads in affinity domain S0
. Assuming S0 = 0,4,1,5
the threads are pinned to CPUs 0 and 1, hence skipping of one thread during selection. The streams of the triad
benchmark sum up to 1GB
placed in affinity domain S0
. The number of iteration is explicitly set to 100
likwid-bench -t update -w S0:100kB -w S1:100kB
update
using all threads in affinity domain S0
and S1
. The threads scheduled on S0
use stream that sum up to 100kB
. Similar to S1
the threads are placed there working only on their socket-local streams. The results of both workgroups are combined. likwid-perfctr -c E:S0:4 -g MEM -m likwid-bench -t update -w S0:100kB:4
update
using 4
threads in affinity domain S0
. The input and output stream of the copy
benchmark sum up to 100kB
placed in affinity domain S0
. The benchmark execution is measured using the Marker_API. It measures the MEM
performance group on the first four CPUs of the S0
affinity domain. For further information about hardware performance counters see likwid-perfctr
likwid-perfctr
. The pinning is done by likwid-bench
likwid-bench -t copy -w S0:1GB:2:1:2-0:S1,1:S1
copy
using 2
threads in affinity domain S0
skipping one thread during selection. The two streams used in the copy
benchmark have the IDs 0 and 1 and a summed up size of 1GB
. Both streams are placed in affinity domain S1
. */