Intra Node Stencil Performance Evaluation Collection
#
| model type | Cavium Thunder X2 (ARMv8) |
| model name | |
| micro-architecture | |
| micro-architecture modeler | |
| cores per socket | 32 |
| cores per NUMA domain | 32 |
| cacheline size | 64 B |
| clock | 2.2 GHz |
| NUMA domains per socket | 1 |
This machine file was generated for kerncraft version 0.8.6.dev0.
| clang | -O3 -target aarch64-unknown-linux-gnu -D_POSIX_C_SOURCE=200112L -fopenmp -ffreestanding |
| gcc | -O3 -march=armv8.1-a -fopenmp -ffreestanding |
| ADD | MUL | FMA | total | |
| Single Precission | INFORMATION_REQUIRED | INFORMATION_REQUIRED | INFORMATION_REQUIRED | INFORMATION_REQUIRED |
| Double Precission | INFORMATION_REQUIRED | INFORMATION_REQUIRED | INFORMATION_REQUIRED | INFORMATION_REQUIRED |
| groups | 64 |
| cores per group | 1 |
| threads per group | 4 |
| transfers overlap | false |
| sets | 64 |
| ways | 8 |
| cl_size | 64 B |
| replacement_policy | LRU |
| write_allocate | true |
| write_back | true |
| load_from | L2 |
| store_to | L2 |
| accesses | INFORMATION_REQUIRED (e.g., L1D_REPLACEMENT__PMC0) |
| misses | INFORMATION_REQUIRED (e.g., L2_LINES_IN_ALL__PMC1) |
| evicts | INFORMATION_REQUIRED (e.g., L2_LINES_OUT_DIRTY_ALL__PMC2) |
| groups | 64 |
| cores per group | 1 |
| threads per group | 4 |
| transfers overlap | false |
| sets | 512 |
| ways | 8 |
| cl_size | 64 B |
| replacement_policy | LRU |
| write_allocate | true |
| write_back | true |
| load_from | None |
| store_to | L3 |
| victims_to | L3 |
| accesses | INFORMATION_REQUIRED (e.g., L1D_REPLACEMENT__PMC0) |
| misses | INFORMATION_REQUIRED (e.g., L2_LINES_IN_ALL__PMC1) |
| evicts | INFORMATION_REQUIRED (e.g., L2_LINES_OUT_DIRTY_ALL__PMC2) |
| groups | 2 |
| cores per group | 32 |
| threads per group | 128 |
| transfers overlap | false |
| sets | 65536 |
| ways | 8 |
| cl_size | 64 B |
| replacement_policy | LRU |
| write_allocate | false |
| write_back | true |
| accesses | INFORMATION_REQUIRED (e.g., L1D_REPLACEMENT__PMC0) |
| misses | INFORMATION_REQUIRED (e.g., L2_LINES_IN_ALL__PMC1) |
| evicts | INFORMATION_REQUIRED (e.g., L2_LINES_OUT_DIRTY_ALL__PMC2) |
| cores per group | 32 |
| threads per group | 128 |
| transfers overlap | false |
OSACA345
INFORMATION_REQUIRED Example:max(UOPS_DISPATCHED_PORT_PORT_0__PMC2, UOPS_DISPATCHED_PORT_PORT_1__PMC3, UOPS_DISPATCHED_PORT_PORT_4__PMC0, UOPS_DISPATCHED_PORT_PORT_5__PMC1)
OSACA00DV11DV2345
INFORMATION_REQUIRED Example:max(UOPS_DISPATCHED_PORT_PORT_0__PMC2, UOPS_DISPATCHED_PORT_PORT_1__PMC3, UOPS_DISPATCHED_PORT_PORT_4__PMC0, UOPS_DISPATCHED_PORT_PORT_5__PMC1)
| FLOPs per iteration | 0 |
| read streams | 1 Streams with 8.00 B |
| write streams | 1 Streams with 8.00 B |
| read+write streams | 0 Streams with 0.00 B |
| FLOPs per iteration | 2 |
| read streams | 2 Streams with 16.00 B |
| write streams | 1 Streams with 8.00 B |
| read+write streams | 1 Streams with 8.00 B |
| FLOPs per iteration | 0 |
| read streams | 1 Streams with 8.00 B |
| write streams | 0 Streams with 0.00 B |
| read+write streams | 0 Streams with 0.00 B |
| FLOPs per iteration | 2 |
| read streams | 3 Streams with 24.00 B |
| write streams | 1 Streams with 8.00 B |
| read+write streams | 0 Streams with 0.00 B |
| FLOPs per iteration | 0 |
| read streams | 1 Streams with 8.00 B |
| write streams | 1 Streams with 8.00 B |
| read+write streams | 1 Streams with 8.00 B |