Instruction set selection
From cppreference.com
During compilation libsimdpp needs to be explicitly told which instruction set to use. This is done by defining one of the macros listed in the table below before the first inclusion of the simdpp/simd.h
header. Multiple options may be specified: for example, if user wants to select AVX and FMA4, then he needs to define SIMDPP_ARCH_X86_AVX
and SIMDPP_ARCH_X86_FMA4
macros.
The following instruction sets are supported by the library.
Instruction set | Macro to enable it | Remarks |
---|---|---|
Non-SIMD | (none) | Uses plain C. May be slower than equivalent C implementation as it makes compiler harder to reason about code. The compiler may still vectorize the code if it knows certain SIMD instruction set is available. |
x86 SSE2 | SIMDPP_ARCH_X86_SSE2
|
(none) |
x86 SSE3 | SIMDPP_ARCH_X86_SSE3
|
Implies SSE2 |
x86 SSSE3 | SIMDPP_ARCH_X86_SSSE3
|
Implies SSE3 |
x86 SSE4.1 | SIMDPP_ARCH_X86_SSE4_1
|
Implies SSSE3 |
x86 AVX | SIMDPP_ARCH_X86_AVX
|
256-bit vectors for floating-point values, 128-bit vectors for integers. Implies SSE4.1 |
x86 FMA3 (Intel flavor) | SIMDPP_ARCH_X86_FMA3
|
Implies SSE3. |
x86 FMA4 (AMD flavor) | SIMDPP_ARCH_X86_FMA4
|
Implies SSE3. |
x86 XOP | SIMDPP_ARCH_X86_XOR
|
Implies SSE3. |
x86 AVX2 | SIMDPP_ARCH_X86_AVX
|
256-bit vectors. Implies AVX |
x86 AVX512F | SIMDPP_ARCH_X86_AVX512F
|
512-bit vectors for floating-point and 32, 64-bit integer types. 8-bit and 16-bit integer vectors have 256-bit length. Implies AVX2 |
ARM NEON without floating-point support | SIMDPP_ARCH_ARM_NEON
|
Does not use SIMD instructions for floating-point computations. The rationale for this mode is that certain NEON implementations have imprecise single-precision floating-point units. |
ARM NEON with floating-point support | SIMDPP_ARCH_ARM_NEON_FLT_SP
|
Uses SIMD instructions for single-precision floating-point computations, non-SIMD instructions for double-precision floating-point computations. |
ARM NEONv2 | SIMDPP_ARCH_ARM_NEON_FLT_SP or SIMDPP_ARCH_ARM_NEON
|
Automatically enabled when compiling for ARM64. All floating-point computations are done on the NEON unit. |
PowerPC Altivec | SIMDPP_ARCH_POWER_ALTIVEC
|
Does not use SIMD for double-precision and 64-bit integer computations |