INSPECT

Intra Node Stencil Performance Evaluation Collection

Review Overview

Machine Files

HaswellEP_E5-2695v3

Data has been reviewed

Julian Hornich says:

Level two (L2) cache bandwidth is optimistic, may be up to 64 B/cy as stated by Intel. But in practice this value is rarely reached.



HaswellEP_E5-2695v3_CoD

Data has been reviewed

Julian Hornich says:

Level two (L2) cache bandwidth is optimistic, may be up to 64 B/cy as stated by Intel. But in practice this value is rarely reached.



HaswellEX_E5-2695v3

Data has been reviewed

Julian Hornich says:

Level two (L2) cache bandwidth is optimistic, may be up to 64 B/cy as stated by Intel. But in practice this value is rarely reached.



SkylakeSP_Gold-5122

might be okay...

Julian Hammer says:

The correct way to measure and predict L2-Memory and L3-Memory traffic is unknown.



SkylakeSP_Gold-6148

might be okay...

Julian Hammer says:

The correct way to measure and predict L2-Memory and L3-Memory traffic is unknown.



SkylakeSP_Gold-6148_MCA

might be okay...

Julian Hammer says:

The correct way to measure and predict L2-Memory and L3-Memory traffic is unknown.



SkylakeSP_Gold-6148_OSACA

might be okay...

Julian Hammer says:

The correct way to measure and predict L2-Memory and L3-Memory traffic is unknown.



SkylakeSP_Gold-6148_SNC

might be okay...

Julian Hammer says:

The correct way to measure and predict L2-Memory and L3-Memory traffic is unknown.



SkylakeSP_Gold-6148_avx512

might be okay...

Julian Hammer says:

The correct way to measure and predict L2-Memory and L3-Memory traffic is unknown.



SkylakeSP_Platinum-8147_2_7GHz

might be okay...

Julian Hammer says:

The correct way to measure and predict L2-Memory and L3-Memory traffic is unknown.



Stencils

Dimension Radius Weighting Kind Coefficients Datatype Machine Comments
3D r1 heterogeneous box constant double IvyBridgeEP_E5-2660v2 data_transfers: general: Ivy Bridge and Sandy Bridge have problems with splitloads and small stencils containing a central point. This issue was raised to Intel, but no solution was offered. As of Haswell, the problem is resolved. Including `#pragma vector aligned` reduced the problem, but does not solve it completly. Removing the central point solves this issue.grid_scaling: iaca: spatial_blocking: stencil: system_info: thread_scaling: uptodate:
3D r1 heterogeneous box variable double IvyBridgeEP_E5-2660v2 data_transfers: general: Ivy Bridge and Sandy Bridge have problems with splitloads and small stencils containing a central point. This issue was raised to Intel, but no solution was offered. As of Haswell, the problem is resolved. Including `#pragma vector aligned` reduced the problem, but does not solve it completly. Removing the central point solves this issue.grid_scaling: iaca: spatial_blocking: stencil: system_info: thread_scaling: uptodate:
3D r1 heterogeneous box variable double SkylakeSP_Gold-6148_512 general: On Skylake, L2 and L3 caches can not be measured and modeled correctly due to missing documentation. This may explain the overprediction with ECM.uptodate:
3D r1 heterogeneous star constant double BroadwellEP_E5-2697_CoD data_transfers: general: grid_scaling: iaca: spatial_blocking: stencil: system_info: thread_scaling: uptodate:
3D r1 heterogeneous star constant double IvyBridgeEP_E5-2660v2 data_transfers: general: Ivy Bridge and Sandy Bridge have problems with splitloads and small stencils containing a central point. This issue was raised to Intel, but no solution was offered. As of Haswell, the problem is resolved. Including `#pragma vector aligned` reduced the problem, but does not solve it completly. Removing the central point solves this issue.grid_scaling: iaca: spatial_blocking: stencil: system_info: thread_scaling: uptodate:
3D r1 heterogeneous star constant double SkylakeSP_Gold-6148 general: On Skylake, L2 and L3 caches can not be measured and modeled correctly due to missing documentation. This may explain the overprediction with ECM.uptodate:
3D r1 heterogeneous star constant double SkylakeSP_Gold-6148_variant_avx512 general: On Skylake, L2 and L3 caches can not be measured and modeled correctly due to missing documentation. This may explain the overprediction with ECM.uptodate:
3D r1 heterogeneous star variable double IvyBridgeEP_E5-2660v2 data_transfers: general: Ivy Bridge and Sandy Bridge have problems with splitloads and small stencils containing a central point. This issue was raised to Intel, but no solution was offered. As of Haswell, the problem is resolved. Including `#pragma vector aligned` reduced the problem, but does not solve it completly. Removing the central point solves this issue.grid_scaling: iaca: spatial_blocking: stencil: system_info: thread_scaling: uptodate:
3D r1 heterogeneous star variable double SkylakeSP_Gold-6148 general: On Skylake, L2 and L3 caches can not be measured and modeled correctly due to missing documentation. This may explain the overprediction with ECM.uptodate:
3D r1 homogeneous box constant double SkylakeSP_Gold-6148_variant_avx512 general: On Skylake, L2 and L3 caches can not be measured and modeled correctly due to missing documentation. This may explain the overprediction with ECM.uptodate:
3D r1 homogeneous box variable double BroadwellEP_E5-2697_CoD data_transfers: grid_Scaling: measurement follows general trend, but prediction beyond is a bit pessimistic with increased L2 traffic. Roofline is off because of the large T_nOL contribution.spatial_blocking: missing blocked measurementthread_scaling: measurement is faultyuptodate:
3D r1 homogeneous box variable double HaswellEP_E5-2695v3_CoD data_transfers: grid_Scaling: increase of runtime due to memory traffic shows very latespatial_blocking: missing blocked measurementsthread_scaling: faulty measurementuptodate:
3D r1 homogeneous box variable double IvyBridgeEP_E5-2660v2 data_transfers: general: Ivy Bridge and Sandy Bridge have problems with splitloads and small stencils containing a central point. This issue was raised to Intel, but no solution was offered. As of Haswell, the problem is resolved. Including `#pragma vector aligned` reduced the problem, but does not solve it completly. Removing the central point solves this issue.grid_scaling: iaca: spatial_blocking: stencil: system_info: thread_scaling: uptodate:
3D r1 homogeneous box variable double SkylakeSP_Gold-6148_512 data_transfers: The Skylake cache architecture is not well documented, which makes L2 and L3 cache prediction and measurement inaccurate.grid_scaling: The Skylake cache architecture is not well documented, which make cache measurements inacurate. Nonetheless, the general trend is followed by the measurements.spatial_blocking: missing blocking numbersthread_scaling: faulty measurementsuptodate:
3D r1 homogeneous star constant double BroadwellEP_E5-2697_CoD general: grid_scaling: follows trend, but ~10% slower measurement compared to prediction in memory bound regimespatial_blocking: missing the relevant measurement data with blockinguptodate:
3D r1 homogeneous star constant double HaswellEP_E5-2695v3_CoD general: spatial_blocking: Needs to be rerun. Blocking should show roughly 400MLUP/s all the way to N^3=1000uptodate:
3D r1 homogeneous star constant double IvyBridgeEP_E5-2660v2 data_transfers: general: Sandy Bridge and Ivy Bridge have problems with splitloads and small stencils containing a central point. This issue was raised to Intel, but no solution was offered. As of Haswell, the problem is resolved. Including `#pragma vector aligned` reduced the problem, but does not solve it completly. Removing the central point solves this issue.grid_scaling: iaca: spatial_blocking: stencil: system_info: thread_scaling: uptodate:
3D r1 homogeneous star constant double SandyBridgeEP_E5-2680 data_transfers: general: Sandy Bridge and Ivy Bridge have problems with splitloads and small stencils containing a central point. This issue was raised to Intel, but no solution was offered. As of Haswell, the problem is resolved. Including `#pragma vector aligned` reduced the problem, but does not solve it completly. Removing the central point solves this issue.grid_scaling: iaca: spatial_blocking: stencil: system_info: thread_scaling: uptodate:
3D r1 homogeneous star constant double SkylakeSP_Gold-6148 general: The Skylake cache architecture is not well documented, which makes L2 and L3 cache prediction and measurement inaccurate.uptodate:
3D r1 homogeneous star constant double SkylakeSP_Gold-6148_variant_avx512 data_transfers: The Skylake cache architecture is not well documented, which makes L2 and L3 cache prediction and measurement inaccurate.grid_scaling: The Skylake cache architecture is not well documented, which makes L2 and L3 cache prediction and measurement inaccurate. Slight overprediction (faster), probably due to Skylake's new L3 cache architectureuptodate:
3D r1 homogeneous star variable double IvyBridgeEP_E5-2660v2 data_transfers: general: Ivy Bridge and Sandy Bridge have problems with splitloads and small stencils containing a central point. This issue was raised to Intel, but no solution was offered. As of Haswell, the problem is resolved. Including `#pragma vector aligned` reduced the problem, but does not solve it completly. Removing the central point solves this issue.grid_scaling: iaca: spatial_blocking: stencil: system_info: thread_scaling: uptodate:
3D r1 isotropic box constant double SkylakeSP_Gold-6148 general: uptodate:
3D r1 isotropic box variable double IvyBridgeEP_E5-2660v2 data_transfers: general: Ivy Bridge and Sandy Bridge have problems with splitloads and small stencils containing a central point. This issue was raised to Intel, but no solution was offered. As of Haswell, the problem is resolved. Including `#pragma vector aligned` reduced the problem, but does not solve it completly. Removing the central point solves this issue.grid_scaling: iaca: spatial_blocking: stencil: system_info: thread_scaling: uptodate:
3D r1 isotropic star constant double IvyBridgeEP_E5-2660v2 data_transfers: general: Ivy Bridge and Sandy Bridge have problems with splitloads and small stencils containing a central point. This issue was raised to Intel, but no solution was offered. As of Haswell, the problem is resolved. Including `#pragma vector aligned` reduced the problem, but does not solve it completly. Removing the central point solves this issue.grid_scaling: iaca: spatial_blocking: stencil: system_info: thread_scaling: uptodate:
3D r1 isotropic star constant double SkylakeSP_Gold-6148 general: uptodate:
3D r1 isotropic star constant double SkylakeSP_Gold-6148_variant_avx512 general: uptodate:
3D r1 point-symmetric box constant double BroadwellEP_E5-2697_CoD general: uptodate:
3D r1 point-symmetric box constant double HaswellEP_E5-2695v3_CoD general: uptodate:
3D r1 point-symmetric box constant double SkylakeSP_Gold-6148 general: spatial_blocking: thread_scaling: Needs to consider updated machine file. Rooflien data is missing.uptodate:
3D r1 point-symmetric star constant double BroadwellEP_E5-2697_CoD general: uptodate:
3D r1 point-symmetric star constant double HaswellEP_E5-2695v3_CoD general: uptodate:
3D r1 point-symmetric star constant double SkylakeSP_Gold-6148 general: On Skylake, L2 and L3 caches can not be measured and modeled correctly due to missing documentation. This may explain the overprediction with ECM.uptodate:
3D r2 heterogeneous star constant double IvyBridgeEP_E5-2660v2 data_transfers: general: Ivy Bridge and Sandy Bridge have problems with splitloads and small stencils containing a central point. This issue was raised to Intel, but no solution was offered. As of Haswell, the problem is resolved. Including `#pragma vector aligned` reduced the problem, but does not solve it completly. Removing the central point solves this issue.grid_scaling: iaca: spatial_blocking: stencil: system_info: thread_scaling: uptodate:
3D r2 heterogeneous star constant double SkylakeSP_Gold-6148 general: On Skylake, L2 and L3 caches can not be measured and modeled correctly due to missing documentation. This may explain the overprediction with ECM.uptodate:
3D r2 heterogeneous star constant double SkylakeSP_Gold-6148_variant_avx512 general: On Skylake, L2 and L3 caches can not be measured and modeled correctly due to missing documentation. This may explain the overprediction with ECM.uptodate:
3D r2 heterogeneous star constant float SkylakeSP_Gold-6148 general: On Skylake, L2 and L3 caches can not be measured and modeled correctly due to missing documentation. This may explain the overprediction with ECM.uptodate:
3D r2 heterogeneous star variable double SkylakeSP_Gold-6148 general: On Skylake, L2 and L3 caches can not be measured and modeled correctly due to missing documentation. This may explain the overprediction with ECM.uptodate:
3D r2 homogeneous star constant double BroadwellEP_E5-2697_CoD general: uptodate:
3D r2 homogeneous star constant double HaswellEP_E5-2695v3_CoD general: uptodate:
3D r2 homogeneous star constant double SkylakeSP_Gold-6148 general: thread_scaling: This is based on a faulty machine file. Needs to be rerun.uptodate:
3D r2 homogeneous star constant double SkylakeSP_Gold-6148_variant_avx512 general: uptodate:
3D r2 homogeneous star constant float HaswellEP_E5-2695v3_CoD general: uptodate:
3D r2 homogeneous star constant float SkylakeSP_Gold-6148 general: uptodate:
3D r2 homogeneous star variable double HaswellEP_E5-2695v3_CoD general: spatial_blocking: Missing blocked measurementsuptodate:
3D r2 homogeneous star variable double SkylakeSP_Gold-6148 general: spatial_blocking: Missing blocking measurements.thread_scaling: Needs to be rerun with correct machine file.uptodate:
3D r2 homogeneous star variable float HaswellEP_E5-2695v3_CoD general: spatial_blocking: Missing blocked measurements.uptodate:
3D r2 isotropic box constant double SkylakeSP_Gold-6148 general: On Skylake, L2 and L3 caches can not be measured and modeled correctly due to missing documentation. This may explain the overprediction with ECM.uptodate:
3D r2 isotropic box constant double SkylakeSP_Gold-6148_variant_avx512 general: On Skylake, L2 and L3 caches can not be measured and modeled correctly due to missing documentation. This may explain the overprediction with ECM.uptodate:
3D r2 isotropic star constant double SkylakeSP_Gold-6148 general: On Skylake, L2 and L3 caches can not be measured and modeled correctly due to missing documentation. This may explain the overprediction with ECM.uptodate:
3D r2 isotropic star constant double SkylakeSP_Gold-6148_variant_avx512 general: On Skylake, L2 and L3 caches can not be measured and modeled correctly due to missing documentation. This may explain the overprediction with ECM.uptodate:
3D r2 isotropic star constant float SkylakeSP_Gold-6148 general: On Skylake, L2 and L3 caches can not be measured and modeled correctly due to missing documentation. This may explain the overprediction with ECM.uptodate:
3D r2 isotropic star variable double SkylakeSP_Gold-6148 general: On Skylake, L2 and L3 caches can not be measured and modeled correctly due to missing documentation. This may explain the overprediction with ECM.uptodate:
3D r2 point-symmetric star constant double BroadwellEP_E5-2697_CoD general: uptodate:
3D r2 point-symmetric star constant double HaswellEP_E5-2695v3_CoD general: uptodate:
3D r2 point-symmetric star constant double SkylakeSP_Gold-6148 general: On Skylake, L2 and L3 caches can not be measured and modeled correctly due to missing documentation. This may explain the overprediction with ECM.spatial_blocking: thread_scaling: Needs to be rerun with fixed machine description.uptodate:
3D r2 point-symmetric star constant float HaswellEP_E5-2695v3_CoD general: grid_scaling: Cache thrashing does show in measurements, but only slightly.uptodate:
3D r2 point-symmetric star constant float SkylakeSP_Gold-6148 general: On Skylake, L2 and L3 caches can not be measured and modeled correctly due to missing documentation. This may explain the overprediction with ECM.spatial_blocking: thread_scaling: Needs to be rerun with fixed machine description.uptodate:
3D r2 point-symmetric star variable double BroadwellEP_E5-2697_CoD data_transfers: Cache simulator predicths thrashing effects, which only seldom show in measurements.grid_scaling: Cache thrashing is happening, but hardly shown in the measurement. Measurement is > 10% off from prediction. With increasing memory traffic per iteration, the discrapency grows.spatial_blocking: A little low performance, but still disdinguishable from unblocked code.thread_scaling: uptodate:
3D r2 point-symmetric star variable double HaswellEP_E5-2695v3_CoD general: uptodate:
3D r2 point-symmetric star variable double SkylakeSP_Gold-6148 data_transfers: On Skylake, L2 and L3 caches can not be measured and modeled correctly due to missing documentation. This may explain the overprediction with ECM.grid_scaling: spatial_blocking: thread_scaling: Needs to be rerun with fixed machine description.uptodate:
3D r2 point-symmetric star variable float HaswellEP_E5-2695v3_CoD data_transfers: Cache thrashing does show in measurements, but only slightly.general: grid_scaling: Cache thrashing does show in measurements, but only slightly.uptodate:
3D r3 heterogeneous star constant double BroadwellEP_E5-2697_CoD general: uptodate:
3D r3 heterogeneous star constant double HaswellEP_E5-2695v3_CoD general: uptodate:
3D r3 heterogeneous star constant double IvyBridgeEP_E5-2660v2 data_transfers: grid_scaling: A dependency chain in assembly leads to stall cycles, which is not considered by IACA and T_OLuptodate:
3D r3 heterogeneous star constant double SkylakeSP_Gold-6148_variant_avx512 general: uptodate:
3D r3 isotropic star constant double BroadwellEP_E5-2697_CoD general: uptodate:
3D r3 isotropic star constant double HaswellEP_E5-2695v3_CoD general: uptodate:
3D r3 isotropic star constant double IvyBridgeEP_E5-2660v2 data_transfers: uptodate:
3D r3 isotropic star constant double IvyBridgeEP_E5-2660v2_variant_vector_aligned data_transfers: uptodate:
3D r3 isotropic star constant double SkylakeSP_Gold-6148 data_transfers: On Skylake, L2 and L3 caches can not be measured and modeled correctly due to missing documentation.grid_scaling: CSim: On Skylake, L2 and L3 caches can not be measured and modeled correctly due to missing documentation.uptodate: