Instruction set selection

During compilation libsimdpp needs to be explicitly told which instruction set to use. This is done by defining one of the macros listed in the table below before the first inclusion of the simdpp/simd.h header. Multiple options may be specified: for example, if user wants to select AVX and FMA4, then he needs to define SIMDPP_ARCH_X86_AVX and SIMDPP_ARCH_X86_FMA4 macros.

Each row of the table lists the data formats supported by the SIMD unit of particular instruction set. The library can be used for unsupported data formats, but the operations are implemented without using SIMD hardware and may be slower than equivalent scalar C++ code.

Instruction set	Macro to enable it	8-bit integers	16-bit integers	32-bit integers	64-bit integers	32-bit floats	64-bit floats	Remarks
Non-SIMD	(none)	N/A	N/A	N/A	N/A	N/A	N/A	Uses plain C++. May be slower than equivalent C/C++ implementation as it makes compiler harder to reason about code. The compiler may still vectorize the code if it knows certain SIMD instruction set is available.
x86 SSE2	`SIMDPP_ARCH_X86_SSE2`	128	128	128	128	128	128	(none)
x86 SSE3	`SIMDPP_ARCH_X86_SSE3`	128	128	128	128	128	128	Implies SSE2
x86 SSSE3	`SIMDPP_ARCH_X86_SSSE3`	128	128	128	128	128	128	Implies SSE3
x86 SSE4.1	`SIMDPP_ARCH_X86_SSE4_1`	128	128	128	128	128	128	Implies SSSE3
x86 `popcnt` instruction	`SIMDPP_ARCH_X86_POPCNT_INSN`	128	128	128	128	128	128	Implies SSSE3. This does not directly correspond to the ABM instruction set as Intel provides the instruction in SSE 4.2 already.
x86 AVX	`SIMDPP_ARCH_X86_AVX`	128	128	128	128	256	256	Implies SSE4.1
x86 FMA3 (Intel flavor)	`SIMDPP_ARCH_X86_FMA3`	128	128	128	128	128	128	Implies SSE3.
x86 FMA4 (AMD flavor)	`SIMDPP_ARCH_X86_FMA4`	128	128	128	128	128	128	Implies SSE3.
x86 XOP	`SIMDPP_ARCH_X86_XOP`	128	128	128	128	128	128	Implies SSE3.
x86 AVX2	`SIMDPP_ARCH_X86_AVX2`	256	256	256	256	256	256	Implies AVX
x86 AVX512F	`SIMDPP_ARCH_X86_AVX512F`	256	256	512	512	512	512	Implies AVX2
x86 AVX512BW	`SIMDPP_ARCH_X86_AVX512BW`	512	512	512	512	512	512	Implies AVX512F
x86 AVX512DQ	`SIMDPP_ARCH_X86_AVX512DQ`	256	256	512	512	512	512	Implies AVX512F
x86 AVX512VL	`SIMDPP_ARCH_X86_AVX512VL`	256	256	512	512	512	512	Implies AVX512F
ARM NEON without floating-point support	`SIMDPP_ARCH_ARM_NEON`	128	128	128	128	N/A	N/A	The rationale for this mode is that certain NEON implementations have imprecise single-precision floating-point units. Not all 64-bit integer operations are provided in hardware.
ARM NEON with floating-point support	`SIMDPP_ARCH_ARM_NEON_FLT_SP`	128	128	128	128	128	N/A	Not all 64-bit integer operations are provided in hardware.
ARM NEONv2	`SIMDPP_ARCH_ARM_NEON_FLT_SP` or `SIMDPP_ARCH_ARM_NEON`	128	128	128	128	128	128	Automatically enabled when compiling for ARM64. All floating-point computations are done on the NEON unit.
PowerPC Altivec	`SIMDPP_ARCH_POWER_ALTIVEC`	128	128	128	N/A	128	N/A	(none)
PowerPC 2.06 VSX	`SIMDPP_ARCH_POWER_VSX_206`	128	128	128	N/A	128	128	Implies Altivec
PowerPC 2.07 VSX	`SIMDPP_ARCH_POWER_VSX_207`	128	128	128	128	128	128	Implies PowerPC 2.07 VSX
MIPS MSA	`SIMDPP_ARCH_MIPS_MSA`	128	128	128	128	128	128	(none)

Types
Capabilities
Operations
Conversions
Bitwise operations
Floating-point operations
Integer operations
Memory access operations
Shuffle operations
Miscellaneous operations

Instruction set selection
Dynamic dispatch