Overview
The library has a number of types that correspond to various types of data that may be stored within a SIMD registers. The types may be categorized in two dimensions: the type of data stored within a single element (lane) of the wrapped SIMD register and the number of such elements wrapped by the type.
The following element types are supported:

 signed integers: 8, 16, 32 and 64bit wide
 unsigned integers: 8, 16, 32 and 64bit wide
 floatingpoint numbers: 32 and 64bit wide
 integer masks: with elements 8, 16, 32 and 64bits wide
 floatingpoint masks: with elements 32 and 64bits wide
Masks are special vector types that store one bit of information per element. They are described below.
The number of elements that may be contained within a vector type may be any power of two, which larger than certain minimum bound that is dependent on element type. Currently the minimum size of a vector is 128 bits, which means that vectors containing 8, 16, 32 and 64bit elements must have at least 16, 8, 4 and 2 of them respectively.
The actual physical layout of a vector type is undefined. In particular, this means that the user must use the library functions to store or load vectors from memory and also not depend on sizeof operator.
The following class templates are provided for nonmask types:
template<unsigned N, class Expr = void> class int8;


template<unsigned N, class Expr = void> class int16;


template<unsigned N, class Expr = void> class int32;


template<unsigned N, class Expr = void> class int64;


template<unsigned N, class Expr = void> class uint8;


template<unsigned N, class Expr = void> class uint16;


template<unsigned N, class Expr = void> class uint32;


template<unsigned N, class Expr = void> class uint64;


template<unsigned N, class Expr = void> class float32;


template<unsigned N, class Expr = void> class float64;


Here N
is the number of elements within vector.
The Expr
template parameter is used to support expression templates. Most user code will use the default value void
.
Masks[edit]
Masks are special vector types that are similar to their regular counterparts from user's perspective. The difference is that masks store one bit of information per element: either all bits are ones or zeroes. The physical layout is undefined similarly to the regular vector types. On certain instruction sets such as AVX512 each mask element occupies single physical bit, on others it is effectively a regular vector that stores either ones or zeroes in its elements.
The following class templates are provided for mask types:
template<unsigned N, class Expr = void> class mask_int8;


template<unsigned N, class Expr = void> class mask_int16;


template<unsigned N, class Expr = void> class mask_int32;


template<unsigned N, class Expr = void> class mask_int64;


template<unsigned N, class Expr = void> class mask_float32;


template<unsigned N, class Expr = void> class mask_float64;


Here N
is the number of elements within mask vector and Expr
is used to implement expression templates.
Different types are used for floatingpoint and integer masks not without a reason: on certain architectures integer and floating point operations are implemented in different processor "domains" with extra latency to pass data between them. Separate types are used to select instructions that operate in correct domain to avoid that extra latency.
Vector width[edit]
As described above the number of elements in a vector type can be any power of two such that the vector size is not less than 128 bits. Vector types map to as many native SIMD registers as is needed to support the specified number of elements. For example, an instance of int32x8 type maps to two instances of __m128i
type on SSE2, but to a single instance of __m256i
on AVX2.
This flexibility allows to use widest native vector widths even in cases when mixed floatingpoint  integer algorithms are implemented on nonuniform SIMD architectures. Consider a simple example:
int* src; float* dst; ... for (unsigned i = 0; i < size; ++i) { *dst++ = (int) (*src++) * 3.14f; }
The vectorized version of this code should use int32<4> and float32<4> types on SSE2 and int32<8> and float32<8> on AVX2. The former instruction set supports 128bit SIMD instructions for both integer and floatingpoint operations whereas the latter supports 256bit SIMD instructions. Effectively utilizing nonuniform instruction set such as AVX is more complex: for floatingpoint operations 256bit instructions are available, wheres integer SIMD instructions still support only 128 bits. With libsimdpp the user can simply use int32<8> and float32<8> types and floatingpoint operations would be done using 256bit SIMD instructions, whereas integer operations would employ twice as many 128bit instructions.
Using vector types wider than available SIMD instructions increases register pressure. Users should query the most efficient vector widths from the library via vector size macros and use it to size vectors for their algorithms.
Type hierarchy[edit]
The vector types form type hierarchy by inheriting from empty class template using curriously recurring template pattern. This allows to write function templates accepting a certain category of vector types as parameters without needing to create excessive number of overloads. For example, it's possible to write a single function template that accepts any two integer vectors with 32bit elements.
The type hierarchy is shown below:
any_vec ┣━ any_vec8 ┃ ┗━ any_int8 ┃ ┣━ int8 ┃ ┣━ uint8 ┃ ┗━ mask_int8 ┣━ any_vec16 ┃ ┗━ any_int16 ┃ ┣━ int16 ┃ ┣━ uint16 ┃ ┗━ mask_int16 ┣━ any_vec32 ┃ ┣━ any_int32 ┃ ┃ ┣━ int32 ┃ ┃ ┣━ uint32 ┃ ┃ ┗━ mask_int32 ┃ ┗━ any_float32 ┃ ┣━ float32 ┃ ┗━ mask_float32 ┗━ any_vec64 ┣━ any_int64 ┃ ┣━ int64 ┃ ┣━ uint64 ┃ ┗━ mask_int64 ┗━ any_float64 ┣━ float64 ┗━ mask_float64
The categorization types are only useful as parameters in functions that a certain vector category. They are never used in other contexts.