libsimdpp
0.9.3
|
Functions | |
basic_int8x16 | simdpp::zip_lo (basic_int8x16 a, basic_int8x16 b) |
Interleaves the lower halves of two vectors. More... | |
basic_int8x32 | simdpp::zip_lo (basic_int8x32 a, basic_int8x32 b) |
basic_int16x8 | simdpp::zip_lo (basic_int16x8 a, basic_int16x8 b) |
basic_int16x16 | simdpp::zip_lo (basic_int16x16 a, basic_int16x16 b) |
basic_int32x4 | simdpp::zip_lo (basic_int32x4 a, basic_int32x4 b) |
basic_int32x8 | simdpp::zip_lo (basic_int32x8 a, basic_int32x8 b) |
basic_int64x2 | simdpp::zip_lo (basic_int64x2 a, basic_int64x2 b) |
basic_int64x4 | simdpp::zip_lo (basic_int64x4 a, basic_int64x4 b) |
template<int s0, int s1> | |
basic_int8x16 | simdpp::make_shuffle_bytes16_mask (basic_int8x16 &mask) |
Makes a mask to shuffle an int8x16 vector using permute_bytes16 , shuffle_bytes16 , permute_zbytes16 or shuffle_zbytes16 functions. More... | |
template<int s0, int s1> | |
basic_int8x32 | simdpp::make_shuffle_bytes16_mask (basic_int8x32 &mask) |
Makes a mask to shuffle an int8x16 vector using permute_bytes16 , shuffle_bytes16 , permute_zbytes16 or shuffle_zbytes16 functions. More... | |
float32x4 | simdpp::zip_lo (float32x4 a, float32x4 b) |
Interleaves the lower halves of two vectors. More... | |
float32x8 | simdpp::zip_lo (float32x8 a, float32x8 b) |
Interleaves the lower halves of two vectors. More... | |
float64x2 | simdpp::zip_lo (float64x2 a, float64x2 b) |
Interleaves the lower halves of two vectors. More... | |
float64x4 | simdpp::zip_lo (float64x4 a, float64x4 b) |
Interleaves the lower halves of two vectors. More... | |
basic_int8x16 | simdpp::zip_hi (basic_int8x16 a, basic_int8x16 b) |
Interleaves the higher halves of two vectors. More... | |
basic_int8x32 | simdpp::zip_hi (basic_int8x32 a, basic_int8x32 b) |
Interleaves the higher halves of two vectors. More... | |
basic_int16x8 | simdpp::zip_hi (basic_int16x8 a, basic_int16x8 b) |
Interleaves the higher halves of two vectors. More... | |
basic_int16x16 | simdpp::zip_hi (basic_int16x16 a, basic_int16x16 b) |
Interleaves the higher halves of two vectors. More... | |
basic_int32x4 | simdpp::zip_hi (basic_int32x4 a, basic_int32x4 b) |
Interleaves the higher halves of two vectors. More... | |
basic_int32x8 | simdpp::zip_hi (basic_int32x8 a, basic_int32x8 b) |
Interleaves the higher halves of two vectors. More... | |
basic_int64x2 | simdpp::zip_hi (basic_int64x2 a, basic_int64x2 b) |
Interleaves the higher halves of two vectors. More... | |
basic_int64x4 | simdpp::zip_hi (basic_int64x4 a, basic_int64x4 b) |
Interleaves the higher halves of two vectors. More... | |
float32x4 | simdpp::zip_hi (float32x4 a, float32x4 b) |
Interleaves the higher halves of two vectors. More... | |
float32x8 | simdpp::zip_hi (float32x8 a, float32x8 b) |
Interleaves the higher halves of two vectors. More... | |
float64x2 | simdpp::zip_hi (float64x2 a, float64x2 b) |
Interleaves the higher halves of two vectors. More... | |
float64x4 | simdpp::zip_hi (float64x4 a, float64x4 b) |
Interleaves the higher halves of two vectors. More... | |
template<unsigned shift> | |
basic_int8x16 | simdpp::move_l (basic_int8x16 a) |
Moves the elements in an int8x16 vector to the left by shift positions. More... | |
template<unsigned shift> | |
basic_int8x32 | simdpp::move_l (basic_int8x32 a) |
Moves the elements in an int8x16 vector to the left by shift positions. More... | |
template<unsigned shift> | |
basic_int16x8 | simdpp::move_l (basic_int16x8 a) |
Moves the 16-bit elements in a vector to the left by shift positions. More... | |
template<unsigned shift> | |
basic_int16x16 | simdpp::move_l (basic_int16x16 a) |
Moves the 16-bit elements in a vector to the left by shift positions. More... | |
template<unsigned shift> | |
basic_int32x4 | simdpp::move_l (basic_int32x4 a) |
Moves the 32-bit elements in a vector to the left by shift positions. More... | |
template<unsigned shift> | |
basic_int32x8 | simdpp::move_l (basic_int32x8 a) |
Moves the 32-bit elements in a vector to the left by shift positions. More... | |
template<unsigned shift> | |
basic_int64x2 | simdpp::move_l (basic_int64x2 a) |
Moves the 64-bit elements in a vector to the left by shift positions. More... | |
template<unsigned shift> | |
basic_int64x4 | simdpp::move_l (basic_int64x4 a) |
Moves the 64-bit elements in a vector to the left by shift positions. More... | |
template<unsigned shift> | |
float32x4 | simdpp::move_l (float32x4 a) |
Moves the 32-bit elements in a vector to the left by shift positions. More... | |
template<unsigned shift> | |
float32x8 | simdpp::move_l (float32x8 a) |
Moves the 32-bit elements in a vector to the left by shift positions. More... | |
template<unsigned shift> | |
float64x2 | simdpp::move_l (float64x2 a) |
Moves the 64-bit elements in a vector to the left by shift positions. More... | |
template<unsigned shift> | |
float64x4 | simdpp::move_l (float64x4 a) |
Moves the 64-bit elements in a vector to the left by shift positions. More... | |
template<unsigned shift> | |
basic_int8x16 | simdpp::move_r (basic_int8x16 a) |
Moves the 8-bit elements in a vector to the right by shift positions. More... | |
template<unsigned shift> | |
basic_int8x32 | simdpp::move_r (basic_int8x32 a) |
Moves the 8-bit elements in a vector to the right by shift positions. More... | |
template<unsigned shift> | |
basic_int16x8 | simdpp::move_r (basic_int16x8 a) |
Moves the 16-bit elements in a vector to the right by shift positions. More... | |
template<unsigned shift> | |
basic_int16x16 | simdpp::move_r (basic_int16x16 a) |
Moves the 16-bit elements in a vector to the right by shift positions. More... | |
template<unsigned shift> | |
basic_int32x4 | simdpp::move_r (basic_int32x4 a) |
Moves the 32-bit elements in a vector to the right by shift positions. More... | |
template<unsigned shift> | |
basic_int32x8 | simdpp::move_r (basic_int32x8 a) |
Moves the 32-bit elements in a vector to the right by shift positions. More... | |
template<unsigned shift> | |
basic_int64x2 | simdpp::move_r (basic_int64x2 a) |
Moves the 64-bit elements in a vector to the right by shift positions. More... | |
template<unsigned shift> | |
basic_int64x4 | simdpp::move_r (basic_int64x4 a) |
Moves the 64-bit elements in a vector to the right by shift positions. More... | |
template<unsigned shift> | |
float32x4 | simdpp::move_r (float32x4 a) |
Moves the 32-bit elements in a vector to the right by shift positions. More... | |
template<unsigned shift> | |
float32x8 | simdpp::move_r (float32x8 a) |
Moves the 32-bit elements in a vector to the right by shift positions. More... | |
template<unsigned shift> | |
float64x2 | simdpp::move_r (float64x2 a) |
Moves the 64-bit elements in a vector to the right by shift positions. More... | |
template<unsigned shift> | |
float64x4 | simdpp::move_r (float64x4 a) |
Moves the 64-bit elements in a vector to the right by shift positions. More... | |
template<unsigned s> | |
basic_int8x16 | simdpp::broadcast (basic_int8x16 a) |
Broadcasts the specified 8-bit value to all elements within 128-bit lanes. More... | |
template<unsigned s> | |
basic_int8x32 | simdpp::broadcast (basic_int8x32 a) |
Broadcasts the specified 8-bit value to all elements within 128-bit lanes. More... | |
template<unsigned s> | |
basic_int16x8 | simdpp::broadcast (basic_int16x8 a) |
Broadcasts the specified 16-bit value to all elements within 128-bit lanes. More... | |
template<unsigned s> | |
basic_int16x16 | simdpp::broadcast (basic_int16x16 a) |
Broadcasts the specified 16-bit value to all elements within 128-bit lanes. More... | |
template<unsigned s> | |
basic_int32x4 | simdpp::broadcast (basic_int32x4 a) |
Broadcasts the specified 32-bit value to all elements within 128-bit lanes. More... | |
template<unsigned s> | |
basic_int32x8 | simdpp::broadcast (basic_int32x8 a) |
Broadcasts the specified 32-bit value to all elements within 128-bit lanes. More... | |
template<unsigned s> | |
basic_int64x2 | simdpp::broadcast (basic_int64x2 a) |
Broadcasts the specified 64-bit value to all elements within 128-bit lanes. More... | |
template<unsigned s> | |
basic_int64x4 | simdpp::broadcast (basic_int64x4 a) |
Broadcasts the specified 64-bit value to all elements within 128-bit lanes. More... | |
template<unsigned s> | |
float32x4 | simdpp::broadcast (float32x4 a) |
Broadcasts the specified 32-bit value to all elements within 128-bit lanes. More... | |
template<unsigned s> | |
float32x8 | simdpp::broadcast (float32x8 a) |
Broadcasts the specified 32-bit value to all elements within 128-bit lanes. More... | |
template<unsigned s> | |
float64x2 | simdpp::broadcast (float64x2 a) |
Broadcasts the specified 64-bit value to all elements within 128-bit lanes. More... | |
template<unsigned s> | |
float64x4 | simdpp::broadcast (float64x4 a) |
Broadcasts the specified 64-bit value to all elements within 128-bit lanes. More... | |
template<unsigned s> | |
basic_int8x16 | simdpp::broadcast_w (basic_int8x16 a) |
Broadcasts the specified 8-bit value to all elements within 128-bit lane. More... | |
template<unsigned s> | |
basic_int8x32 | simdpp::broadcast_w (basic_int8x32 a) |
Broadcasts the specified 8-bit value to all elements within 128-bit lane. More... | |
template<unsigned s> | |
basic_int16x8 | simdpp::broadcast_w (basic_int16x8 a) |
Broadcasts the specified 16-bit value to all elements within a int16x8 vector. More... | |
template<unsigned s> | |
basic_int16x16 | simdpp::broadcast_w (basic_int16x16 a) |
Broadcasts the specified 16-bit value to all elements within a int16x8 vector. More... | |
template<unsigned s> | |
basic_int32x4 | simdpp::broadcast_w (basic_int32x4 a) |
Broadcasts the specified 32-bit value to all elements within a int32x4 vector. More... | |
template<unsigned s> | |
basic_int32x8 | simdpp::broadcast_w (basic_int32x8 a) |
Broadcasts the specified 32-bit value to all elements within a int32x4 vector. More... | |
template<unsigned s> | |
basic_int64x2 | simdpp::broadcast_w (basic_int64x2 a) |
Broadcasts the specified 64-bit value to all elements within a int64x2 vector. More... | |
template<unsigned s> | |
basic_int64x4 | simdpp::broadcast_w (basic_int64x4 a) |
Broadcasts the specified 64-bit value to all elements within a int64x2 vector. More... | |
template<unsigned s> | |
float32x4 | simdpp::broadcast_w (float32x4 a) |
Broadcasts the specified 32-bit value to all elements within a float32x4 vector. More... | |
template<unsigned s> | |
float32x8 | simdpp::broadcast_w (float32x8 a) |
Broadcasts the specified 32-bit value to all elements within a float32x4 vector. More... | |
template<unsigned s> | |
float64x2 | simdpp::broadcast_w (float64x2 a) |
Broadcasts the specified 64-bit value to all elements within a float64x2 vector. More... | |
template<unsigned s> | |
float64x4 | simdpp::broadcast_w (float64x4 a) |
Broadcasts the specified 64-bit value to all elements within a float64x2 vector. More... | |
template<unsigned shift> | |
basic_int8x16 | simdpp::align (basic_int8x16 lower, basic_int8x16 upper) |
Extracts a int8x16 vector from two concatenated int8x16 vectors. More... | |
template<unsigned shift> | |
basic_int8x32 | simdpp::align (basic_int8x32 lower, basic_int8x32 upper) |
Extracts a int8x16 vector from two concatenated int8x16 vectors. More... | |
template<unsigned shift> | |
basic_int16x8 | simdpp::align (basic_int16x8 lower, basic_int16x8 upper) |
Extracts a int16x8 vector from two concatenated int16x8 vectors. More... | |
template<unsigned shift> | |
basic_int16x16 | simdpp::align (basic_int16x16 lower, basic_int16x16 upper) |
Extracts a int16x8 vector from two concatenated int16x8 vectors. More... | |
template<unsigned shift> | |
basic_int32x4 | simdpp::align (basic_int32x4 lower, basic_int32x4 upper) |
Extracts a int32x4 vector from two concatenated int32x4 vectors. More... | |
template<unsigned shift> | |
basic_int32x8 | simdpp::align (basic_int32x8 lower, basic_int32x8 upper) |
Extracts a int32x4 vector from two concatenated int32x4 vectors. More... | |
template<unsigned shift> | |
basic_int64x2 | simdpp::align (basic_int64x2 lower, basic_int64x2 upper) |
Extracts a int64x2 vector from two concatenated int64x2 vectors. More... | |
template<unsigned shift> | |
basic_int64x4 | simdpp::align (basic_int64x4 lower, basic_int64x4 upper) |
Extracts a int64x2 vector from two concatenated int64x2 vectors. More... | |
template<unsigned shift> | |
float32x4 | simdpp::align (float32x4 lower, float32x4 upper) |
Extracts a float32x4 vector from two concatenated float32x4 vectors. More... | |
template<unsigned shift> | |
float32x8 | simdpp::align (float32x8 lower, float32x8 upper) |
Extracts a float32x4 vector from two concatenated float32x4 vectors. More... | |
template<unsigned shift> | |
float64x2 | simdpp::align (float64x2 lower, float64x2 upper) |
Extracts a float64x2 vector from two concatenated float64x2 vectors. More... | |
template<unsigned shift> | |
float64x4 | simdpp::align (float64x4 lower, float64x4 upper) |
Extracts a float64x2 vector from two concatenated float64x2 vectors. More... | |
basic_int8x16 | simdpp::blend (basic_int8x16 on, basic_int8x16 off, basic_int8x16 mask) |
Composes a vector from two sources according to a mask. More... | |
basic_int8x16 | simdpp::blend (basic_int8x16 on, basic_int8x16 off, mask_int8x16 mask) |
Composes a vector from two sources according to a mask. More... | |
basic_int8x32 | simdpp::blend (basic_int8x32 on, basic_int8x32 off, basic_int8x32 mask) |
Composes a vector from two sources according to a mask. More... | |
basic_int8x32 | simdpp::blend (basic_int8x32 on, basic_int8x32 off, mask_int8x32 mask) |
Composes a vector from two sources according to a mask. More... | |
basic_int16x8 | simdpp::blend (basic_int16x8 on, basic_int16x8 off, basic_int16x8 mask) |
Composes vector from two sources according to a mask. More... | |
basic_int16x16 | simdpp::blend (basic_int16x16 on, basic_int16x16 off, basic_int16x16 mask) |
Composes vector from two sources according to a mask. More... | |
basic_int16x8 | simdpp::blend (basic_int16x8 on, basic_int16x8 off, mask_int16x8 mask) |
Composes vector from two sources according to a mask. More... | |
basic_int16x16 | simdpp::blend (basic_int16x16 on, basic_int16x16 off, mask_int16x16 mask) |
Composes vector from two sources according to a mask. More... | |
basic_int32x4 | simdpp::blend (basic_int32x4 on, basic_int32x4 off, basic_int32x4 mask) |
Composes a vector from two sources according to a mask. More... | |
basic_int32x8 | simdpp::blend (basic_int32x8 on, basic_int32x8 off, basic_int32x8 mask) |
Composes a vector from two sources according to a mask. More... | |
basic_int32x4 | simdpp::blend (basic_int32x4 on, basic_int32x4 off, mask_int32x4 mask) |
Composes a vector from two sources according to a mask. More... | |
basic_int32x8 | simdpp::blend (basic_int32x8 on, basic_int32x8 off, mask_int32x8 mask) |
Composes a vector from two sources according to a mask. More... | |
basic_int64x2 | simdpp::blend (basic_int64x2 on, basic_int64x2 off, basic_int64x2 mask) |
Composes a vector from two sources according to a mask. More... | |
basic_int64x4 | simdpp::blend (basic_int64x4 on, basic_int64x4 off, basic_int64x4 mask) |
Composes a vector from two sources according to a mask. More... | |
basic_int64x2 | simdpp::blend (basic_int64x2 on, basic_int64x2 off, mask_int64x2 mask) |
Composes a vector from two sources according to a mask. More... | |
basic_int64x4 | simdpp::blend (basic_int64x4 on, basic_int64x4 off, mask_int64x4 mask) |
Composes a vector from two sources according to a mask. More... | |
float32x4 | simdpp::blend (float32x4 on, float32x4 off, float32x4 mask) |
Composes a vector from two sources according to a mask. More... | |
float32x4 | simdpp::blend (float32x4 on, float32x4 off, int128 mask) |
Composes a vector from two sources according to a mask. More... | |
float32x8 | simdpp::blend (float32x8 on, float32x8 off, float32x8 mask) |
Composes a vector from two sources according to a mask. More... | |
float32x8 | simdpp::blend (float32x8 on, float32x8 off, int256 mask) |
Composes a vector from two sources according to a mask. More... | |
float32x4 | simdpp::blend (float32x4 on, float32x4 off, mask_float32x4 mask) |
Composes a vector from two sources according to a mask. More... | |
float32x8 | simdpp::blend (float32x8 on, float32x8 off, mask_float32x8 mask) |
Composes a vector from two sources according to a mask. More... | |
float64x2 | simdpp::blend (float64x2 on, float64x2 off, float64x2 mask) |
Composes a vector from two sources according to a mask. More... | |
float64x2 | simdpp::blend (float64x2 on, float64x2 off, int128 mask) |
Composes a vector from two sources according to a mask. More... | |
float64x4 | simdpp::blend (float64x4 on, float64x4 off, float64x4 mask) |
Composes a vector from two sources according to a mask. More... | |
float64x4 | simdpp::blend (float64x4 on, float64x4 off, int256 mask) |
Composes a vector from two sources according to a mask. More... | |
float64x2 | simdpp::blend (float64x2 on, float64x2 off, mask_float64x2 mask) |
Composes a vector from two sources according to a mask. More... | |
float64x4 | simdpp::blend (float64x4 on, float64x4 off, mask_float64x4 mask) |
Composes a vector from two sources according to a mask. More... | |
basic_int8x16 | simdpp::unzip_lo (basic_int8x16 a, basic_int8x16 b) |
De-interleaves the odd(lower) elements of two int8x16 vectors. More... | |
basic_int8x32 | simdpp::unzip_lo (basic_int8x32 a, basic_int8x32 b) |
De-interleaves the odd(lower) elements of two int8x16 vectors. More... | |
basic_int16x8 | simdpp::unzip_lo (basic_int16x8 a, basic_int16x8 b) |
De-interleaves the odd(lower) elements of two int16x8 vectors. More... | |
basic_int16x16 | simdpp::unzip_lo (basic_int16x16 a, basic_int16x16 b) |
De-interleaves the odd(lower) elements of two int16x8 vectors. More... | |
basic_int32x4 | simdpp::unzip_lo (basic_int32x4 a, basic_int32x4 b) |
De-interleaves the odd(lower) elements of two int32x4 vectors. More... | |
basic_int32x8 | simdpp::unzip_lo (basic_int32x8 a, basic_int32x8 b) |
De-interleaves the odd(lower) elements of two int32x4 vectors. More... | |
basic_int64x2 | simdpp::unzip_lo (basic_int64x2 a, basic_int64x2 b) |
De-interleaves the odd(lower) elements of two int64x2 vectors. More... | |
basic_int64x4 | simdpp::unzip_lo (basic_int64x4 a, basic_int64x4 b) |
De-interleaves the odd(lower) elements of two int64x2 vectors. More... | |
float32x4 | simdpp::unzip_lo (float32x4 a, float32x4 b) |
De-interleaves the odd(lower) elements of two float32x4 vectors. More... | |
float32x8 | simdpp::unzip_lo (float32x8 a, float32x8 b) |
De-interleaves the odd(lower) elements of two float32x4 vectors. More... | |
float64x2 | simdpp::unzip_lo (float64x2 a, float64x2 b) |
De-interleaves the odd(lower) elements of two float64x2 vectors. More... | |
float64x4 | simdpp::unzip_lo (float64x4 a, float64x4 b) |
De-interleaves the odd(lower) elements of two float64x2 vectors. More... | |
basic_int8x16 | simdpp::unzip_hi (basic_int8x16 a, basic_int8x16 b) |
De-interleaves the even(higher) elements of two int8x16 vectors. More... | |
basic_int8x32 | simdpp::unzip_hi (basic_int8x32 a, basic_int8x32 b) |
De-interleaves the even(higher) elements of two int8x16 vectors. More... | |
basic_int16x8 | simdpp::unzip_hi (basic_int16x8 a, basic_int16x8 b) |
De-interleaves the even(higher) elements of two int16x8 vectors. More... | |
basic_int16x16 | simdpp::unzip_hi (basic_int16x16 a, basic_int16x16 b) |
De-interleaves the even(higher) elements of two int16x8 vectors. More... | |
basic_int32x4 | simdpp::unzip_hi (basic_int32x4 a, basic_int32x4 b) |
De-interleaves the even(higher) elements of two int32x4 vectors. More... | |
basic_int32x8 | simdpp::unzip_hi (basic_int32x8 a, basic_int32x8 b) |
De-interleaves the even(higher) elements of two int32x4 vectors. More... | |
basic_int64x2 | simdpp::unzip_hi (basic_int64x2 a, basic_int64x2 b) |
De-interleaves the even(higher) elements of two int64x2 vectors. More... | |
basic_int64x4 | simdpp::unzip_hi (basic_int64x4 a, basic_int64x4 b) |
De-interleaves the even(higher) elements of two int64x2 vectors. More... | |
float32x4 | simdpp::unzip_hi (float32x4 a, float32x4 b) |
De-interleaves the even(higher) elements of two float32x4 vectors. More... | |
float32x8 | simdpp::unzip_hi (float32x8 a, float32x8 b) |
De-interleaves the even(higher) elements of two float32x4 vectors. More... | |
float64x2 | simdpp::unzip_hi (float64x2 a, float64x2 b) |
De-interleaves the even(higher) elements of two float64x2 vectors. More... | |
float64x4 | simdpp::unzip_hi (float64x4 a, float64x4 b) |
De-interleaves the even(higher) elements of two float64x2 vectors. More... | |
int128 | simdpp::permute_bytes16 (int128 a, int128 mask) |
Selects bytes from a vector according to a mask. More... | |
float32x4 | simdpp::permute_bytes16 (float32x4 a, int128 mask) |
Selects bytes from a vector according to a mask. More... | |
float64x2 | simdpp::permute_bytes16 (float64x2 a, int128 mask) |
Selects bytes from a vector according to a mask. More... | |
int256 | simdpp::permute_bytes16 (int256 a, int256 mask) |
Selects bytes from a vector according to a mask. More... | |
float32x8 | simdpp::permute_bytes16 (float32x8 a, int256 mask) |
Selects bytes from a vector according to a mask. More... | |
float64x4 | simdpp::permute_bytes16 (float64x4 a, int256 mask) |
Selects bytes from a vector according to a mask. More... | |
template<unsigned s0, unsigned s1, unsigned s2, unsigned s3> | |
int128 | simdpp::permute (basic_int16x8 a) |
Permutes the 16-bit values within each 4 consecutive values of the vector. More... | |
template<unsigned s0, unsigned s1, unsigned s2, unsigned s3> | |
basic_int16x16 | simdpp::permute (basic_int16x16 a) |
Permutes the 16-bit values within each 4 consecutive values of the vector. More... | |
Detailed Description
Function Documentation
basic_int8x16 simdpp::align | ( | basic_int8x16 | lower, |
basic_int8x16 | upper | ||
) |
Extracts a int8x16 vector from two concatenated int8x16 vectors.
- 128-bit version:
- In SSE2-SSE3 this intrinsic results in at least 3 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE3 this intrinsic results in at least 6 instructions.
- In SSSE3-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int8x32 simdpp::align | ( | basic_int8x32 | lower, |
basic_int8x32 | upper | ||
) |
Extracts a int8x16 vector from two concatenated int8x16 vectors.
- 128-bit version:
- In SSE2-SSE3 this intrinsic results in at least 3 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE3 this intrinsic results in at least 6 instructions.
- In SSSE3-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int16x8 simdpp::align | ( | basic_int16x8 | lower, |
basic_int16x8 | upper | ||
) |
Extracts a int16x8 vector from two concatenated int16x8 vectors.
- 128-bit version:
- In SSE2-SSE3 this intrinsic results in at least 3 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE3 this intrinsic results in at least 6 instructions.
- In SSSE3-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int16x16 simdpp::align | ( | basic_int16x16 | lower, |
basic_int16x16 | upper | ||
) |
Extracts a int16x8 vector from two concatenated int16x8 vectors.
- 128-bit version:
- In SSE2-SSE3 this intrinsic results in at least 3 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE3 this intrinsic results in at least 6 instructions.
- In SSSE3-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int32x4 simdpp::align | ( | basic_int32x4 | lower, |
basic_int32x4 | upper | ||
) |
Extracts a int32x4 vector from two concatenated int32x4 vectors.
- 128-bit version:
- In SSE2-SSE3 this intrinsic results in at least 3 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE3 this intrinsic results in at least 6 instructions.
- In SSSE3-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int32x8 simdpp::align | ( | basic_int32x8 | lower, |
basic_int32x8 | upper | ||
) |
Extracts a int32x4 vector from two concatenated int32x4 vectors.
- 128-bit version:
- In SSE2-SSE3 this intrinsic results in at least 3 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE3 this intrinsic results in at least 6 instructions.
- In SSSE3-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int64x2 simdpp::align | ( | basic_int64x2 | lower, |
basic_int64x2 | upper | ||
) |
Extracts a int64x2 vector from two concatenated int64x2 vectors.
- 128-bit version:
- In SSE2-SSE3 this intrinsic results in at least 3 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE3 this intrinsic results in at least 6 instructions.
- In SSSE3-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int64x4 simdpp::align | ( | basic_int64x4 | lower, |
basic_int64x4 | upper | ||
) |
Extracts a int64x2 vector from two concatenated int64x2 vectors.
- 128-bit version:
- In SSE2-SSE3 this intrinsic results in at least 3 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE3 this intrinsic results in at least 6 instructions.
- In SSSE3-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
float32x4 simdpp::align | ( | float32x4 | lower, |
float32x4 | upper | ||
) |
Extracts a float32x4 vector from two concatenated float32x4 vectors.
- 128-bit version:
- In SSE2-SSE3 this intrinsic results in at least 3 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE3 this intrinsic results in at least 6 instructions.
- In SSSE3-SSE4.1 NEON and ALTIVEC this intrinsic results in at least 2 instructions.
float32x8 simdpp::align | ( | float32x8 | lower, |
float32x8 | upper | ||
) |
Extracts a float32x4 vector from two concatenated float32x4 vectors.
- 128-bit version:
- In SSE2-SSE3 this intrinsic results in at least 3 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE3 this intrinsic results in at least 6 instructions.
- In SSSE3-SSE4.1 NEON and ALTIVEC this intrinsic results in at least 2 instructions.
float64x2 simdpp::align | ( | float64x2 | lower, |
float64x2 | upper | ||
) |
Extracts a float64x2 vector from two concatenated float64x2 vectors.
- 128-bit version:
- In SSE2-SSE3 this intrinsic results in at least 3 instructions.
- Not vectorized in NEON and .
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE3 this intrinsic results in at least 6 instructions.
- In SSSE3-AVX this intrinsic results in at least 2 instructions.
- Not vectorized in NEON and .
float64x4 simdpp::align | ( | float64x4 | lower, |
float64x4 | upper | ||
) |
Extracts a float64x2 vector from two concatenated float64x2 vectors.
- 128-bit version:
- In SSE2-SSE3 this intrinsic results in at least 3 instructions.
- Not vectorized in NEON and .
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE3 this intrinsic results in at least 6 instructions.
- In SSSE3-AVX this intrinsic results in at least 2 instructions.
- Not vectorized in NEON and .
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-AVX this intrinsic results in at least 3 instructions.
- In XOP this intrinsic results in at least 1 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
- In XOP this intrinsic results in at least 2 instructions.
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-AVX this intrinsic results in at least 3 instructions.
- In XOP this intrinsic results in at least 1 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
- In XOP this intrinsic results in at least 2 instructions.
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-AVX this intrinsic results in at least 3 instructions.
- In XOP this intrinsic results in at least 1 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
- In XOP this intrinsic results in at least 2 instructions.
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-AVX this intrinsic results in at least 3 instructions.
- In XOP this intrinsic results in at least 1 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
- In XOP this intrinsic results in at least 2 instructions.
|
inline |
Composes vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-AVX this intrinsic results in at least 3 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Composes vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-AVX this intrinsic results in at least 3 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Composes vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-AVX this intrinsic results in at least 3 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Composes vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-AVX this intrinsic results in at least 3 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-AVX this intrinsic results in at least 3 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-AVX this intrinsic results in at least 3 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-AVX this intrinsic results in at least 3 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-AVX this intrinsic results in at least 3 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-AVX this intrinsic results in at least 3 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-AVX this intrinsic results in at least 3 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-AVX this intrinsic results in at least 3 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-AVX this intrinsic results in at least 3 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 3 instructions.
- 256-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 3 instructions.
- 256-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 3 instructions.
- 256-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 3 instructions.
- 256-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 3 instructions.
- 256-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 3 instructions.
- 256-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 3 instructions.
- Not vectorized in NEON and .
- 256-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 6 instructions.
- Not vectorized in NEON and .
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 3 instructions.
- Not vectorized in NEON and .
- 256-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 6 instructions.
- Not vectorized in NEON and .
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 3 instructions.
- Not vectorized in NEON and .
- 256-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 6 instructions.
- Not vectorized in NEON and .
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 3 instructions.
- Not vectorized in NEON and .
- 256-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 6 instructions.
- Not vectorized in NEON and .
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 3 instructions.
- Not vectorized in NEON and .
- 256-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 6 instructions.
- Not vectorized in NEON and .
|
inline |
Composes a vector from two sources according to a mask.
Each element within the mask must have either all bits set or all bits unset.
- 128-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 3 instructions.
- Not vectorized in NEON and .
- 256-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 6 instructions.
- Not vectorized in NEON and .
basic_int8x16 simdpp::broadcast | ( | basic_int8x16 | a | ) |
Broadcasts the specified 8-bit value to all elements within 128-bit lanes.
- 128-bit version:
- In SSE2-SSE3 this intrinsic results in at least 7 instructions.
- In SSSE3-AVX this intrinsic results in at least 1-2 instructions.
- In AVX2 this intrinsic results in at least 2 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE3 this intrinsic results in at least 14 instructions.
- In SSSE3-AVX this intrinsic results in at least 2-3 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int8x32 simdpp::broadcast | ( | basic_int8x32 | a | ) |
Broadcasts the specified 8-bit value to all elements within 128-bit lanes.
- 128-bit version:
- In SSE2-SSE3 this intrinsic results in at least 7 instructions.
- In SSSE3-AVX this intrinsic results in at least 1-2 instructions.
- In AVX2 this intrinsic results in at least 2 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE3 this intrinsic results in at least 14 instructions.
- In SSSE3-AVX this intrinsic results in at least 2-3 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int16x8 simdpp::broadcast | ( | basic_int16x8 | a | ) |
Broadcasts the specified 16-bit value to all elements within 128-bit lanes.
- 128-bit version:
- In SSE2-SSE3 this intrinsic results in at least 3 instructions.
- In SSSE3-AVX this intrinsic results in at least 1-2 instructions.
- In AVX2 this intrinsic results in at least 2 instructions.
- 256-bit version:
- In SSE2-SSE3 this intrinsic results in at least 6 instructions.
- In SSSE3-AVX this intrinsic results in at least 2-3 instructions.
- In AVX2, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int16x16 simdpp::broadcast | ( | basic_int16x16 | a | ) |
Broadcasts the specified 16-bit value to all elements within 128-bit lanes.
- 128-bit version:
- In SSE2-SSE3 this intrinsic results in at least 3 instructions.
- In SSSE3-AVX this intrinsic results in at least 1-2 instructions.
- In AVX2 this intrinsic results in at least 2 instructions.
- 256-bit version:
- In SSE2-SSE3 this intrinsic results in at least 6 instructions.
- In SSSE3-AVX this intrinsic results in at least 2-3 instructions.
- In AVX2, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int32x4 simdpp::broadcast | ( | basic_int32x4 | a | ) |
Broadcasts the specified 32-bit value to all elements within 128-bit lanes.
- 256-bit version:
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int32x8 simdpp::broadcast | ( | basic_int32x8 | a | ) |
Broadcasts the specified 32-bit value to all elements within 128-bit lanes.
- 256-bit version:
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int64x2 simdpp::broadcast | ( | basic_int64x2 | a | ) |
Broadcasts the specified 64-bit value to all elements within 128-bit lanes.
- 128-bit version:
- In ALTIVEC this intrinsic results in at least 1-2 instructions.
- 256-bit version:
- In SSE2-AVX and NEON this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 2-3 instructions.
basic_int64x4 simdpp::broadcast | ( | basic_int64x4 | a | ) |
Broadcasts the specified 64-bit value to all elements within 128-bit lanes.
- 128-bit version:
- In ALTIVEC this intrinsic results in at least 1-2 instructions.
- 256-bit version:
- In SSE2-AVX and NEON this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 2-3 instructions.
float32x4 simdpp::broadcast | ( | float32x4 | a | ) |
Broadcasts the specified 32-bit value to all elements within 128-bit lanes.
- 256-bit version:
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
float32x8 simdpp::broadcast | ( | float32x8 | a | ) |
Broadcasts the specified 32-bit value to all elements within 128-bit lanes.
- 256-bit version:
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
float64x2 simdpp::broadcast | ( | float64x2 | a | ) |
Broadcasts the specified 64-bit value to all elements within 128-bit lanes.
- 128-bit version:
- Not vectorized in NEON and .
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 2 instructions.
- Not vectorized in NEON and .
float64x4 simdpp::broadcast | ( | float64x4 | a | ) |
Broadcasts the specified 64-bit value to all elements within 128-bit lanes.
- 128-bit version:
- Not vectorized in NEON and .
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 2 instructions.
- Not vectorized in NEON and .
basic_int8x16 simdpp::broadcast_w | ( | basic_int8x16 | a | ) |
Broadcasts the specified 8-bit value to all elements within 128-bit lane.
- 128-bit version:
- In SSE2-AVX this intrinsic results in at least 5 instructions.
- In AVX2 this intrinsic results in at least 2 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int8x32 simdpp::broadcast_w | ( | basic_int8x32 | a | ) |
Broadcasts the specified 8-bit value to all elements within 128-bit lane.
- 128-bit version:
- In SSE2-AVX this intrinsic results in at least 5 instructions.
- In AVX2 this intrinsic results in at least 2 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int16x8 simdpp::broadcast_w | ( | basic_int16x8 | a | ) |
Broadcasts the specified 16-bit value to all elements within a int16x8 vector.
- 128-bit version:
- In SSE2-AVX this intrinsic results in at least 5 instructions.
- In AVX2 this intrinsic results in at least 2 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int16x16 simdpp::broadcast_w | ( | basic_int16x16 | a | ) |
Broadcasts the specified 16-bit value to all elements within a int16x8 vector.
- 128-bit version:
- In SSE2-AVX this intrinsic results in at least 5 instructions.
- In AVX2 this intrinsic results in at least 2 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int32x4 simdpp::broadcast_w | ( | basic_int32x4 | a | ) |
Broadcasts the specified 32-bit value to all elements within a int32x4 vector.
- 256-bit version:
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int32x8 simdpp::broadcast_w | ( | basic_int32x8 | a | ) |
Broadcasts the specified 32-bit value to all elements within a int32x4 vector.
- 256-bit version:
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int64x2 simdpp::broadcast_w | ( | basic_int64x2 | a | ) |
Broadcasts the specified 64-bit value to all elements within a int64x2 vector.
- 128-bit version:
- In ALTIVEC this intrinsic results in at least 1-2 instructions.
- 256-bit version:
- In SSE2-AVX and NEON this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 1-2 instructions.
basic_int64x4 simdpp::broadcast_w | ( | basic_int64x4 | a | ) |
Broadcasts the specified 64-bit value to all elements within a int64x2 vector.
- 128-bit version:
- In ALTIVEC this intrinsic results in at least 1-2 instructions.
- 256-bit version:
- In SSE2-AVX and NEON this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 1-2 instructions.
float32x4 simdpp::broadcast_w | ( | float32x4 | a | ) |
Broadcasts the specified 32-bit value to all elements within a float32x4 vector.
- 256-bit version:
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
float32x8 simdpp::broadcast_w | ( | float32x8 | a | ) |
Broadcasts the specified 32-bit value to all elements within a float32x4 vector.
- 256-bit version:
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
float64x2 simdpp::broadcast_w | ( | float64x2 | a | ) |
Broadcasts the specified 64-bit value to all elements within a float64x2 vector.
- 128-bit version:
- Not vectorized in NEON and .
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 2 instructions.
- Not vectorized in NEON and .
float64x4 simdpp::broadcast_w | ( | float64x4 | a | ) |
Broadcasts the specified 64-bit value to all elements within a float64x2 vector.
- 128-bit version:
- Not vectorized in NEON and .
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 2 instructions.
- Not vectorized in NEON and .
basic_int8x16 simdpp::make_shuffle_bytes16_mask | ( | basic_int8x16 & | mask | ) |
Makes a mask to shuffle an int8x16 vector using permute_bytes16
, shuffle_bytes16
, permute_zbytes16
or shuffle_zbytes16
functions.
All elements within vectors are grouped into sets of two adjacent elements. Elements within each set of the resulting vector can be selected only from corresponding sets of the source vectors.
The template arguments define which elements to select from each element group: Values [0,1] select elements from the first vector. Values [2,3] select elements from the second vector. The mask can only be used in shuffle_bytes16
or shuffle_zbytes16
Value [-1] sets the corresponding element to zero. The mask can only be used in permute_zbytes16
or shuffle_zbytes16
- 128-bit version:
The created mask will cause shuffle_bytes16
to perform as follows:
- 256-bit version:
The vectors will be shuffled as if the 128-bit version was applied to the lower and higher halves of the vectors separately.
basic_int8x32 simdpp::make_shuffle_bytes16_mask | ( | basic_int8x32 & | mask | ) |
Makes a mask to shuffle an int8x16 vector using permute_bytes16
, shuffle_bytes16
, permute_zbytes16
or shuffle_zbytes16
functions.
All elements within vectors are grouped into sets of two adjacent elements. Elements within each set of the resulting vector can be selected only from corresponding sets of the source vectors.
The template arguments define which elements to select from each element group: Values [0,1] select elements from the first vector. Values [2,3] select elements from the second vector. The mask can only be used in shuffle_bytes16
or shuffle_zbytes16
Value [-1] sets the corresponding element to zero. The mask can only be used in permute_zbytes16
or shuffle_zbytes16
- 128-bit version:
The created mask will cause shuffle_bytes16
to perform as follows:
- 256-bit version:
The vectors will be shuffled as if the 128-bit version was applied to the lower and higher halves of the vectors separately.
basic_int8x16 simdpp::move_l | ( | basic_int8x16 | a | ) |
Moves the elements in an int8x16 vector to the left by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int8x32 simdpp::move_l | ( | basic_int8x32 | a | ) |
Moves the elements in an int8x16 vector to the left by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int16x8 simdpp::move_l | ( | basic_int16x8 | a | ) |
Moves the 16-bit elements in a vector to the left by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int16x16 simdpp::move_l | ( | basic_int16x16 | a | ) |
Moves the 16-bit elements in a vector to the left by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int32x4 simdpp::move_l | ( | basic_int32x4 | a | ) |
Moves the 32-bit elements in a vector to the left by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int32x8 simdpp::move_l | ( | basic_int32x8 | a | ) |
Moves the 32-bit elements in a vector to the left by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int64x2 simdpp::move_l | ( | basic_int64x2 | a | ) |
Moves the 64-bit elements in a vector to the left by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int64x4 simdpp::move_l | ( | basic_int64x4 | a | ) |
Moves the 64-bit elements in a vector to the left by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
float32x4 simdpp::move_l | ( | float32x4 | a | ) |
Moves the 32-bit elements in a vector to the left by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
float32x8 simdpp::move_l | ( | float32x8 | a | ) |
Moves the 32-bit elements in a vector to the left by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
float64x2 simdpp::move_l | ( | float64x2 | a | ) |
Moves the 64-bit elements in a vector to the left by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
float64x4 simdpp::move_l | ( | float64x4 | a | ) |
Moves the 64-bit elements in a vector to the left by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int8x16 simdpp::move_r | ( | basic_int8x16 | a | ) |
Moves the 8-bit elements in a vector to the right by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int8x32 simdpp::move_r | ( | basic_int8x32 | a | ) |
Moves the 8-bit elements in a vector to the right by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int16x8 simdpp::move_r | ( | basic_int16x8 | a | ) |
Moves the 16-bit elements in a vector to the right by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int16x16 simdpp::move_r | ( | basic_int16x16 | a | ) |
Moves the 16-bit elements in a vector to the right by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int32x4 simdpp::move_r | ( | basic_int32x4 | a | ) |
Moves the 32-bit elements in a vector to the right by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int32x8 simdpp::move_r | ( | basic_int32x8 | a | ) |
Moves the 32-bit elements in a vector to the right by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int64x2 simdpp::move_r | ( | basic_int64x2 | a | ) |
Moves the 64-bit elements in a vector to the right by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int64x4 simdpp::move_r | ( | basic_int64x4 | a | ) |
Moves the 64-bit elements in a vector to the right by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
float32x4 simdpp::move_r | ( | float32x4 | a | ) |
Moves the 32-bit elements in a vector to the right by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
float32x8 simdpp::move_r | ( | float32x8 | a | ) |
Moves the 32-bit elements in a vector to the right by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
float64x2 simdpp::move_r | ( | float64x2 | a | ) |
Moves the 64-bit elements in a vector to the right by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
float64x4 simdpp::move_r | ( | float64x4 | a | ) |
Moves the 64-bit elements in a vector to the right by shift positions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
int128 simdpp::permute | ( | basic_int16x8 | a | ) |
Permutes the 16-bit values within each 4 consecutive values of the vector.
The selector values must be in range [0; 3].
- : 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 2 instructions.
- In NEON this intrinsic results in at least 1-5 instructions.
- In ALTIVEC this intrinsic results in at least 1-2 instructions.
- : 256-bit version:
- In SSE2-AVX this intrinsic results in at least 4 instructions.
- In AVX2 this intrinsic results in at least 2 instructions.
- In NEON this intrinsic results in at least 2-10 instructions.
- In ALTIVEC this intrinsic results in at least 2-3 instructions.
basic_int16x16 simdpp::permute | ( | basic_int16x16 | a | ) |
Permutes the 16-bit values within each 4 consecutive values of the vector.
The selector values must be in range [0; 3].
- : 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 2 instructions.
- In NEON this intrinsic results in at least 1-5 instructions.
- In ALTIVEC this intrinsic results in at least 1-2 instructions.
- : 256-bit version:
- In SSE2-AVX this intrinsic results in at least 4 instructions.
- In AVX2 this intrinsic results in at least 2 instructions.
- In NEON this intrinsic results in at least 2-10 instructions.
- In ALTIVEC this intrinsic results in at least 2-3 instructions.
|
inline |
Selects bytes from a vector according to a mask.
Each byte within the mask defines which element to select: Bits 7-4 must be zero or the behavior is undefined Bits 3-0 define the element within the given vector.
- 128-bit version:
- Not implemented for SSE2-SSE3.
- In NEON this intrinsic results in at least 2 instructions.
- 256-bit version:
- The vectors will be shuffled as if the 128-bit version was applied to the lower and higher halves of the vectors separately.
- Not implemented for SSE2-SSE3.
- In SSSE3-AVX and ALTIVEC this intrinsic results in at least 2 instructions.
- In NEON this intrinsic results in at least 4 instructions.
|
inline |
Selects bytes from a vector according to a mask.
Each byte within the mask defines which element to select: Bits 7-4 must be zero or the behavior is undefined Bits 3-0 define the element within the given vector.
- 128-bit version:
- Not implemented for SSE2-SSE3.
- In NEON this intrinsic results in at least 2 instructions.
- 256-bit version:
- The vectors will be shuffled as if the 128-bit version was applied to the lower and higher halves of the vectors separately.
- Not implemented for SSE2-SSE3.
- In SSSE3-AVX and ALTIVEC this intrinsic results in at least 2 instructions.
- In NEON this intrinsic results in at least 4 instructions.
|
inline |
Selects bytes from a vector according to a mask.
Each byte within the mask defines which element to select: Bits 7-4 must be zero or the behavior is undefined Bits 3-0 define the element within the given vector.
- 128-bit version:
- Not implemented for SSE2-SSE3.
- In NEON this intrinsic results in at least 2 instructions.
- 256-bit version:
- The vectors will be shuffled as if the 128-bit version was applied to the lower and higher halves of the vectors separately.
- Not implemented for SSE2-SSE3.
- In SSSE3-AVX and ALTIVEC this intrinsic results in at least 2 instructions.
- In NEON this intrinsic results in at least 4 instructions.
|
inline |
Selects bytes from a vector according to a mask.
Each byte within the mask defines which element to select: Bits 7-4 must be zero or the behavior is undefined Bits 3-0 define the element within the given vector.
- 128-bit version:
- Not implemented for SSE2-SSE3.
- In NEON this intrinsic results in at least 2 instructions.
- 256-bit version:
- The vectors will be shuffled as if the 128-bit version was applied to the lower and higher halves of the vectors separately.
- Not implemented for SSE2-SSE3.
- In SSSE3-AVX and ALTIVEC this intrinsic results in at least 2 instructions.
- In NEON this intrinsic results in at least 4 instructions.
|
inline |
Selects bytes from a vector according to a mask.
Each byte within the mask defines which element to select: Bits 7-4 must be zero or the behavior is undefined Bits 3-0 define the element within the given vector.
- 128-bit version:
- Not implemented for SSE2-SSE3.
- In NEON this intrinsic results in at least 2 instructions.
- 256-bit version:
- The vectors will be shuffled as if the 128-bit version was applied to the lower and higher halves of the vectors separately.
- Not implemented for SSE2-SSE3.
- In SSSE3-AVX and ALTIVEC this intrinsic results in at least 2 instructions.
- In NEON this intrinsic results in at least 4 instructions.
|
inline |
Selects bytes from a vector according to a mask.
Each byte within the mask defines which element to select: Bits 7-4 must be zero or the behavior is undefined Bits 3-0 define the element within the given vector.
- 128-bit version:
- Not implemented for SSE2-SSE3.
- In NEON this intrinsic results in at least 2 instructions.
- 256-bit version:
- The vectors will be shuffled as if the 128-bit version was applied to the lower and higher halves of the vectors separately.
- Not implemented for SSE2-SSE3.
- In SSSE3-AVX and ALTIVEC this intrinsic results in at least 2 instructions.
- In NEON this intrinsic results in at least 4 instructions.
|
inline |
De-interleaves the even(higher) elements of two int8x16 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 3 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
- In AVX2 this intrinsic results in at least 3 instructions.
|
inline |
De-interleaves the even(higher) elements of two int8x16 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 3 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
- In AVX2 this intrinsic results in at least 3 instructions.
|
inline |
De-interleaves the even(higher) elements of two int16x8 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 3 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
- In AVX2 this intrinsic results in at least 3 instructions.
|
inline |
De-interleaves the even(higher) elements of two int16x8 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 3 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 6 instructions.
- In NEON and ALTIVEC this intrinsic results in at least 2 instructions.
- In AVX2 this intrinsic results in at least 3 instructions.
|
inline |
De-interleaves the even(higher) elements of two int32x4 vectors.
- 128-bit version:
- In ALTIVEC this intrinsic results in at least 1-2 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In ALTIVEC this intrinsic results in at least 2-3 instructions.
- In SSE2-AVX and NEON this intrinsic results in at least 2 instructions.
|
inline |
De-interleaves the even(higher) elements of two int32x4 vectors.
- 128-bit version:
- In ALTIVEC this intrinsic results in at least 1-2 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In ALTIVEC this intrinsic results in at least 2-3 instructions.
- In SSE2-AVX and NEON this intrinsic results in at least 2 instructions.
|
inline |
De-interleaves the even(higher) elements of two int64x2 vectors.
- 128-bit version:
- In ALTIVEC this intrinsic results in at least 1-2 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In ALTIVEC this intrinsic results in at least 2-3 instructions.
- In SSE2-AVX and NEON this intrinsic results in at least 2 instructions.
|
inline |
De-interleaves the even(higher) elements of two int64x2 vectors.
- 128-bit version:
- In ALTIVEC this intrinsic results in at least 1-2 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In ALTIVEC this intrinsic results in at least 2-3 instructions.
- In SSE2-AVX and NEON this intrinsic results in at least 2 instructions.
|
inline |
De-interleaves the even(higher) elements of two float32x4 vectors.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE4.1, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
De-interleaves the even(higher) elements of two float32x4 vectors.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE4.1, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
De-interleaves the even(higher) elements of two float64x2 vectors.
- 128-bit version:
- Not vectorized in NEON and .
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- Not vectorized in NEON and .
- In SSE2-AVX this intrinsic results in at least 2 instructions.
|
inline |
De-interleaves the even(higher) elements of two float64x2 vectors.
- 128-bit version:
- Not vectorized in NEON and .
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- Not vectorized in NEON and .
- In SSE2-AVX this intrinsic results in at least 2 instructions.
|
inline |
De-interleaves the odd(lower) elements of two int8x16 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 4-5 instructions.
- In ALTIVEC this intrinsic results in at least 1-2 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 8-9 instructions.
- In NEON this intrinsic results in at least 2 instructions.
- In AVX2 this intrinsic results in at least 4-5 instructions.
- In ALTIVEC this intrinsic results in at least 2-3 instructions.
|
inline |
De-interleaves the odd(lower) elements of two int8x16 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 4-5 instructions.
- In ALTIVEC this intrinsic results in at least 1-2 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 8-9 instructions.
- In NEON this intrinsic results in at least 2 instructions.
- In AVX2 this intrinsic results in at least 4-5 instructions.
- In ALTIVEC this intrinsic results in at least 2-3 instructions.
|
inline |
De-interleaves the odd(lower) elements of two int16x8 vectors.
- 128-bit version:
- In SSE2-SSSE3 this intrinsic results in at least 5 instructions.
- In SSE4.1-AVX2 this intrinsic results in at least 4-5 instructions.
- In ALTIVEC this intrinsic results in at least 1-2 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSSE3 this intrinsic results in at least 5 instructions.
- In SSE4.1-AVX this intrinsic results in at least 8-9 instructions.
- In AVX2 this intrinsic results in at least 4-5 instructions.
- In NEON this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 2-3 instructions.
|
inline |
De-interleaves the odd(lower) elements of two int16x8 vectors.
- 128-bit version:
- In SSE2-SSSE3 this intrinsic results in at least 5 instructions.
- In SSE4.1-AVX2 this intrinsic results in at least 4-5 instructions.
- In ALTIVEC this intrinsic results in at least 1-2 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSSE3 this intrinsic results in at least 5 instructions.
- In SSE4.1-AVX this intrinsic results in at least 8-9 instructions.
- In AVX2 this intrinsic results in at least 4-5 instructions.
- In NEON this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 2-3 instructions.
|
inline |
De-interleaves the odd(lower) elements of two int32x4 vectors.
- 128-bit version:
- In ALTIVEC this intrinsic results in at least 1-2 instructions.
- 256-bit version:
- In SSE2-AVX and NEON this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 2-3 instructions.
|
inline |
De-interleaves the odd(lower) elements of two int32x4 vectors.
- 128-bit version:
- In ALTIVEC this intrinsic results in at least 1-2 instructions.
- 256-bit version:
- In SSE2-AVX and NEON this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 2-3 instructions.
|
inline |
De-interleaves the odd(lower) elements of two int64x2 vectors.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
De-interleaves the odd(lower) elements of two int64x2 vectors.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
De-interleaves the odd(lower) elements of two float32x4 vectors.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE4.1, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
De-interleaves the odd(lower) elements of two float32x4 vectors.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE4.1, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
De-interleaves the odd(lower) elements of two float64x2 vectors.
- 128-bit version:
- Not vectorized in NEON and .
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 2 instructions.
- Not vectorized in NEON and .
|
inline |
De-interleaves the odd(lower) elements of two float64x2 vectors.
- 128-bit version:
- Not vectorized in NEON and .
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 2 instructions.
- Not vectorized in NEON and .
|
inline |
Interleaves the higher halves of two vectors.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Interleaves the higher halves of two vectors.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Interleaves the higher halves of two vectors.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Interleaves the higher halves of two vectors.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Interleaves the higher halves of two vectors.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Interleaves the higher halves of two vectors.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Interleaves the higher halves of two vectors.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Interleaves the higher halves of two vectors.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Interleaves the higher halves of two vectors.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE4.1, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Interleaves the higher halves of two vectors.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE4.1, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Interleaves the higher halves of two vectors.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE4.1, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Interleaves the higher halves of two vectors.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE4.1, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Interleaves the lower halves of two vectors.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
Interleaves the lower halves of two vectors.
- 256-bit version:
- In SSE2-SSE4.1, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Interleaves the lower halves of two vectors.
- 256-bit version:
- In SSE2-SSE4.1, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Interleaves the lower halves of two vectors.
- 256-bit version:
- In SSE2-SSE4.1, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
|
inline |
Interleaves the lower halves of two vectors.
- 256-bit version:
- In SSE2-SSE4.1, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
Generated on Thu Oct 31 2013 04:08:51 for libsimdpp by 1.8.3.1