Instruction set selection
From libsimdpp-docs
During compilation libsimdpp needs to be explicitly told which instruction set to use. This is done by defining one of the macros listed in the table below before the first inclusion of the simdpp/simd.h
header. Multiple options may be specified: for example, if user wants to select AVX and FMA4, then he needs to define SIMDPP_ARCH_X86_AVX
and SIMDPP_ARCH_X86_FMA4
macros.
Each row of the table lists the data formats supported by the SIMD unit of particular instruction set. The library can be used for unsupported data formats, but the operations are implemented without using SIMD hardware and may be slower than equivalent scalar C++ code.
Instruction set | Macro to enable it | 8-bit integers |
16-bit integers |
32-bit integers |
64-bit integers |
32-bit floats |
64-bit floats |
Remarks |
---|---|---|---|---|---|---|---|---|
Non-SIMD | (none) | N/A | N/A | N/A | N/A | N/A | N/A | Uses plain C++. May be slower than equivalent C/C++ implementation as it makes compiler harder to reason about code. The compiler may still vectorize the code if it knows certain SIMD instruction set is available. |
x86 SSE2 | SIMDPP_ARCH_X86_SSE2
|
128 | 128 | 128 | 128 | 128 | 128 | (none) |
x86 SSE3 | SIMDPP_ARCH_X86_SSE3
|
128 | 128 | 128 | 128 | 128 | 128 | Implies SSE2 |
x86 SSSE3 | SIMDPP_ARCH_X86_SSSE3
|
128 | 128 | 128 | 128 | 128 | 128 | Implies SSE3 |
x86 SSE4.1 | SIMDPP_ARCH_X86_SSE4_1
|
128 | 128 | 128 | 128 | 128 | 128 | Implies SSSE3 |
x86 popcnt instruction
|
SIMDPP_ARCH_X86_POPCNT_INSN
|
128 | 128 | 128 | 128 | 128 | 128 | Implies SSSE3. This does not directly correspond to the ABM instruction set as Intel provides the instruction in SSE 4.2 already. |
x86 AVX | SIMDPP_ARCH_X86_AVX
|
128 | 128 | 128 | 128 | 256 | 256 | Implies SSE4.1 |
x86 FMA3 (Intel flavor) | SIMDPP_ARCH_X86_FMA3
|
128 | 128 | 128 | 128 | 128 | 128 | Implies SSE3. |
x86 FMA4 (AMD flavor) | SIMDPP_ARCH_X86_FMA4
|
128 | 128 | 128 | 128 | 128 | 128 | Implies SSE3. |
x86 XOP | SIMDPP_ARCH_X86_XOP
|
128 | 128 | 128 | 128 | 128 | 128 | Implies SSE3. |
x86 AVX2 | SIMDPP_ARCH_X86_AVX2
|
256 | 256 | 256 | 256 | 256 | 256 | Implies AVX |
x86 AVX512F | SIMDPP_ARCH_X86_AVX512F
|
256 | 256 | 512 | 512 | 512 | 512 | Implies AVX2 |
x86 AVX512BW | SIMDPP_ARCH_X86_AVX512BW
|
512 | 512 | 512 | 512 | 512 | 512 | Implies AVX512F |
x86 AVX512DQ | SIMDPP_ARCH_X86_AVX512DQ
|
256 | 256 | 512 | 512 | 512 | 512 | Implies AVX512F |
x86 AVX512VL | SIMDPP_ARCH_X86_AVX512VL
|
256 | 256 | 512 | 512 | 512 | 512 | Implies AVX512F |
ARM NEON without floating-point support |
SIMDPP_ARCH_ARM_NEON
|
128 | 128 | 128 | 128 | N/A | N/A | The rationale for this mode is that certain NEON implementations have imprecise single-precision floating-point units. Not all 64-bit integer operations are provided in hardware. |
ARM NEON with floating-point support |
SIMDPP_ARCH_ARM_NEON_FLT_SP
|
128 | 128 | 128 | 128 | 128 | N/A | Not all 64-bit integer operations are provided in hardware. |
ARM NEONv2 | SIMDPP_ARCH_ARM_NEON_FLT_SP or SIMDPP_ARCH_ARM_NEON
|
128 | 128 | 128 | 128 | 128 | 128 | Automatically enabled when compiling for ARM64. All floating-point computations are done on the NEON unit. |
PowerPC Altivec | SIMDPP_ARCH_POWER_ALTIVEC
|
128 | 128 | 128 | N/A | 128 | N/A | (none) |
PowerPC 2.06 VSX | SIMDPP_ARCH_POWER_VSX_206
|
128 | 128 | 128 | N/A | 128 | 128 | Implies Altivec |
PowerPC 2.07 VSX | SIMDPP_ARCH_POWER_VSX_207
|
128 | 128 | 128 | 128 | 128 | 128 | Implies PowerPC 2.07 VSX |
MIPS MSA | SIMDPP_ARCH_MIPS_MSA
|
128 | 128 | 128 | 128 | 128 | 128 | (none) |