Instruction set selection

From libsimdpp-docs

During compilation libsimdpp needs to be explicitly told which instruction set to use. This is done by defining one of the macros listed in the table below before the first inclusion of the simdpp/simd.h header. Multiple options may be specified: for example, if user wants to select AVX and FMA4, then he needs to define SIMDPP_ARCH_X86_AVX and SIMDPP_ARCH_X86_FMA4 macros.

Each row of the table lists the data formats supported by the SIMD unit of particular instruction set. The library can be used for unsupported data formats, but the operations are implemented without using SIMD hardware and may be slower than equivalent scalar C++ code.

Instruction set Macro to enable it
8-bit integers
16-bit integers
32-bit integers
64-bit integers
32-bit floats
64-bit floats
Remarks
Non-SIMD (none) N/A N/A N/A N/A N/A N/A Uses plain C++. May be slower than equivalent C/C++ implementation as it makes compiler harder to reason about code. The compiler may still vectorize the code if it knows certain SIMD instruction set is available.
x86 SSE2 SIMDPP_ARCH_X86_SSE2 128 128 128 128 128 128 (none)
x86 SSE3 SIMDPP_ARCH_X86_SSE3 128 128 128 128 128 128 Implies SSE2
x86 SSSE3 SIMDPP_ARCH_X86_SSSE3 128 128 128 128 128 128 Implies SSE3
x86 SSE4.1 SIMDPP_ARCH_X86_SSE4_1 128 128 128 128 128 128 Implies SSSE3
x86 popcnt instruction SIMDPP_ARCH_X86_POPCNT_INSN 128 128 128 128 128 128 Implies SSSE3. This does not directly correspond to the ABM instruction set as Intel provides the instruction in SSE 4.2 already.
x86 AVX SIMDPP_ARCH_X86_AVX 128 128 128 128 256 256 Implies SSE4.1
x86 FMA3 (Intel flavor) SIMDPP_ARCH_X86_FMA3 128 128 128 128 128 128 Implies SSE3.
x86 FMA4 (AMD flavor) SIMDPP_ARCH_X86_FMA4 128 128 128 128 128 128 Implies SSE3.
x86 XOP SIMDPP_ARCH_X86_XOP 128 128 128 128 128 128 Implies SSE3.
x86 AVX2 SIMDPP_ARCH_X86_AVX2 256 256 256 256 256 256 Implies AVX
x86 AVX512F SIMDPP_ARCH_X86_AVX512F 256 256 512 512 512 512 Implies AVX2
x86 AVX512BW SIMDPP_ARCH_X86_AVX512BW 512 512 512 512 512 512 Implies AVX512F
x86 AVX512DQ SIMDPP_ARCH_X86_AVX512DQ 256 256 512 512 512 512 Implies AVX512F
x86 AVX512VL SIMDPP_ARCH_X86_AVX512VL 256 256 512 512 512 512 Implies AVX512F
ARM NEON
without floating-point support
SIMDPP_ARCH_ARM_NEON 128 128 128 128 N/A N/A The rationale for this mode is that certain NEON implementations have imprecise single-precision floating-point units. Not all 64-bit integer operations are provided in hardware.
ARM NEON
with floating-point support
SIMDPP_ARCH_ARM_NEON_FLT_SP 128 128 128 128 128 N/A Not all 64-bit integer operations are provided in hardware.
ARM NEONv2 SIMDPP_ARCH_ARM_NEON_FLT_SP
or SIMDPP_ARCH_ARM_NEON
128 128 128 128 128 128 Automatically enabled when compiling for ARM64. All floating-point computations are done on the NEON unit.
PowerPC Altivec SIMDPP_ARCH_POWER_ALTIVEC 128 128 128 N/A 128 N/A (none)
PowerPC 2.06 VSX SIMDPP_ARCH_POWER_VSX_206 128 128 128 N/A 128 128 Implies Altivec
PowerPC 2.07 VSX SIMDPP_ARCH_POWER_VSX_207 128 128 128 128 128 128 Implies PowerPC 2.07 VSX
MIPS MSA SIMDPP_ARCH_MIPS_MSA 128 128 128 128 128 128 (none)