libsimdpp  0.9.3
simdpp::sse Namespace Reference

Functions

basic_int32x4 to_int32x4 (int8x16 a)
 Sign extends the values of a signed int8x16 vector to 32-bits. More...
 
basic_int32x4 to_int32x4 (uint8x16 a)
 Extends the values of a unsigned int8x16 vector to 32-bits. More...
 
basic_int64x2 to_int64x2 (int8x16 a)
 Sign extends the values of a signed int8x16 vector to 64-bits. More...
 
basic_int64x2 to_int64x2 (int16x8 a)
 Sign extends the values of a signed int16x8 vector to 64-bits. More...
 
basic_int64x2 to_int64x2 (uint8x16 a)
 Extends the values of a unsigned int8x16 vector to 64-bits. More...
 
basic_int32x4 to_int32x4_r (float32x4 a)
 Converts the values of a float32x4 vector into signed int32_t representation. More...
 
basic_int32x4 to_int32x4 (float64x2 a)
 Converts the values of a float64x2 vector into int32_t representation using truncation. More...
 
basic_int32x4 to_int32x4_r (float64x2 a)
 Converts the values of a float64x2 vector into int32_t representation. More...
 
float32x4 hadd2 (float32x4 a, float32x4 b)
 Adds the values in adjacent pairs of two float32x4 vectors. More...
 
float64x2 hadd2 (float64x2 a, float64x2 b)
 Adds the values in adjacent pairs of two float64x2 vectors. More...
 
float32x4 hadd4 (float32x4 a)
 Sums the values of a float32x4 vector. More...
 
float32x4 hadd4 (float32x4 a, float32x4 b, float32x4 c, float32x4 d)
 Sums the values within each of four float32x4 vector. More...
 
float32x4 hsub2 (float32x4 a, float32x4 b)
 Subtracts the values in adjacent pairs of two float32x4 vectors. More...
 
float64x2 hsub2 (float64x2 a, float64x2 b)
 Subtracts the values in adjacent pairs of two float64x2 vectors. More...
 
float32x4 sub_add (float32x4 a, float32x4 b)
 Adds or substracts the values of two float32x4 vectors. More...
 
float64x2 sub_add (float64x2 a, float64x2 b)
 Adds or subtracts the values of two float64x2 vectors. More...
 
int128 copysign (int8x16 a, int8x16 b)
 Copies sign from the values of one int8x16 vector to another. More...
 
int128 copysign (int16x8 a, int16x8 b)
 Copies sign from the values of one int16x8 vector to another. More...
 
int128 copysign (int32x4 a, int32x4 b)
 Copies sign from the values of one int32x4 vector to another. More...
 
int128 hadd2 (basic_int16x8 a, basic_int16x8 b)
 Adds values in adjacent pairs of two int16x8 vectors. More...
 
int128 hadd2 (basic_int32x4 a, basic_int32x4 b)
 Adds values in adjacent pairs of two int32x4 vectors. More...
 
int128 hadd2 (basic_int64x2 a, basic_int64x2 b)
 Adds values in adjacent pairs of two int64x2 vectors. More...
 
int128 hadds2 (int16x8 a, int16x8 b)
 Adds and saturates values in adjacent pairs of two signed int16x8 vectors. More...
 
int128 hadd4 (basic_int32x4 a, basic_int32x4 b, basic_int32x4 c, basic_int32x4 d)
 Sums the values within each of four int32x4 vector. More...
 
int128 hsub2 (basic_int16x8 a, basic_int16x8 b)
 Subtracts values in adjacent pairs of two int16x8 vectors. More...
 
int128 hsub2 (basic_int32x4 a, basic_int32x4 b)
 Subtracts values in adjacent pairs of two int32x4 vectors. More...
 
int128 hsub2 (basic_int64x2 a, basic_int64x2 b)
 Subtracts values in adjacent pairs of two int64x2 vectors. More...
 
int128 hsubs2 (int16x8 a, int16x8 b)
 Subtracts and saturates values in adjacent pairs of two signed int16x8 vectors. More...
 
void store_masked (void *p, int128 a, int128 mask)
 Stores bytes in an 128-bit integer vector according to a mask. More...
 
int128 extract_lo (int256 a)
 Extracts the lower half of a 256-bit vector. More...
 
basic_int8x16 extract_lo (basic_int8x32 a)
 Extracts the lower half of a 256-bit vector. More...
 
basic_int16x8 extract_lo (basic_int16x16 a)
 Extracts the lower half of a 256-bit vector. More...
 
basic_int32x4 extract_lo (basic_int32x8 a)
 Extracts the lower half of a 256-bit vector. More...
 
basic_int64x2 extract_lo (basic_int64x4 a)
 Extracts the lower half of a 256-bit vector. More...
 
float32x4 extract_lo (float32x8 a)
 Extracts the lower half of a 256-bit vector. More...
 
float64x2 extract_lo (float64x4 a)
 Extracts the lower half of a 256-bit vector. More...
 
int128 extract_hi (int256 a)
 Extracts the higher half of a 256-bit vector. More...
 
basic_int8x16 extract_hi (basic_int8x32 a)
 Extracts the higher half of a 256-bit vector. More...
 
basic_int16x8 extract_hi (basic_int16x16 a)
 Extracts the higher half of a 256-bit vector. More...
 
basic_int32x4 extract_hi (basic_int32x8 a)
 Extracts the higher half of a 256-bit vector. More...
 
basic_int64x2 extract_hi (basic_int64x4 a)
 Extracts the higher half of a 256-bit vector. More...
 
float32x4 extract_hi (float32x8 a)
 Extracts the higher half of a 256-bit vector. More...
 
float64x2 extract_hi (float64x4 a)
 Extracts the higher half of a 256-bit vector. More...
 
template<unsigned P, unsigned N>
void load_lane (basic_int8x16 &a, const void *p)
 Loads the first N elements of a 128-bit vector from memory. More...
 
template<unsigned P, unsigned N>
void load_lane (basic_int16x8 &a, const void *p)
 Loads the first N elements of a 128-bit vector from memory. More...
 
template<unsigned P, unsigned N>
void load_lane (basic_int32x4 &a, const void *p)
 Loads the first N elements of a 128-bit vector from memory. More...
 
template<unsigned P, unsigned N>
void load_lane (basic_int64x2 &a, const void *p)
 Loads the first N elements of a 128-bit vector from memory. More...
 
template<unsigned P, unsigned N>
void load_lane (float32x4 &a, const float *p)
 Loads the first N elements of a 128-bit vector from memory. More...
 
template<unsigned P, unsigned N>
float64x2 load_lane (float64x2 &a, const double *p)
 Loads the first N elements of a 128-bit vector from memory. More...
 
template<unsigned P, unsigned N>
void store_lane (void *p, basic_int8x16 a)
 Stores the first N elements of a 128-bit vector to memory. More...
 
template<unsigned P, unsigned N>
void store_lane (void *p, basic_int16x8 a)
 Stores the first N elements of a 128-bit vector to memory. More...
 
template<unsigned P, unsigned N>
void store_lane (void *p, basic_int32x4 a)
 Stores the first N elements of a 128-bit vector to memory. More...
 
template<unsigned P, unsigned N>
void store_lane (void *p, basic_int64x2 a)
 Stores the first N elements of a 128-bit vector to memory. More...
 
template<unsigned P, unsigned N>
void store_lane (float *p, float32x4 a)
 Stores the first N elements of a 128-bit vector to memory. More...
 
template<unsigned P, unsigned N>
void store_lane (double *p, float64x2 a)
 Stores the first N elements of a 128-bit vector to memory. More...
 
template<unsigned s0, unsigned s1, unsigned s2, unsigned s3>
basic_int16x8 permute_lo (basic_int16x8 a)
 Permutes the first 4 16-bit values in of each set of 8 consecutive valuees. More...
 
template<unsigned s0, unsigned s1, unsigned s2, unsigned s3>
basic_int16x16 permute_lo (basic_int16x16 a)
 Permutes the first 4 16-bit values in of each set of 8 consecutive valuees. More...
 
template<unsigned s0, unsigned s1, unsigned s2, unsigned s3>
basic_int16x8 permute_hi (basic_int16x8 a)
 Permutes the last 4 16-bit values in of each set of 8 consecutive valuees. More...
 
template<unsigned s0, unsigned s1, unsigned s2, unsigned s3>
basic_int16x16 permute_hi (basic_int16x16 a)
 Permutes the last 4 16-bit values in of each set of 8 consecutive valuees. More...
 

Function Documentation

int128 simdpp::sse::copysign ( int8x16  a,
int8x16  b 
)
inline

Copies sign from the values of one int8x16 vector to another.

r0 = (b0 > 0) ? a0 : ((b0 == 0) ? 0 : -a0)
...
r15 = (b15 > 0) ? a15 : ((b15 == 0) ? 0 : -a15)
  • Not implemented for SSE2 and SSE3.
int128 simdpp::sse::copysign ( int16x8  a,
int16x8  b 
)
inline

Copies sign from the values of one int16x8 vector to another.

r0 = (b0 > 0) ? a0 : ((b0 == 0) ? 0 : -a0)
...
r7 = (b7 > 0) ? a7 : ((b7 == 0) ? 0 : -a7)
  • Not implemented for SSE2 and SSE3.
int128 simdpp::sse::copysign ( int32x4  a,
int32x4  b 
)
inline

Copies sign from the values of one int32x4 vector to another.

r0 = (b0 > 0) ? a0 : ((b0 == 0) ? 0 : -a0)
r1 = (b1 > 0) ? a1 : ((b1 == 0) ? 0 : -a1)
r2 = (b2 > 0) ? a2 : ((b2 == 0) ? 0 : -a2)
r3 = (b3 > 0) ? a3 : ((b3 == 0) ? 0 : -a3)
  • Not implemented for SSE2 and SSE3.
int128 simdpp::sse::extract_hi ( int256  a)
inline

Extracts the higher half of a 256-bit vector.

  • This intrinsic results in at least 0 instructions.
basic_int8x16 simdpp::sse::extract_hi ( basic_int8x32  a)
inline

Extracts the higher half of a 256-bit vector.

  • This intrinsic results in at least 0 instructions.
basic_int16x8 simdpp::sse::extract_hi ( basic_int16x16  a)
inline

Extracts the higher half of a 256-bit vector.

  • This intrinsic results in at least 0 instructions.
basic_int32x4 simdpp::sse::extract_hi ( basic_int32x8  a)
inline

Extracts the higher half of a 256-bit vector.

  • This intrinsic results in at least 0 instructions.
basic_int64x2 simdpp::sse::extract_hi ( basic_int64x4  a)
inline

Extracts the higher half of a 256-bit vector.

  • This intrinsic results in at least 0 instructions.
float32x4 simdpp::sse::extract_hi ( float32x8  a)
inline

Extracts the higher half of a 256-bit vector.

  • This intrinsic results in at least 0 instructions.
float64x2 simdpp::sse::extract_hi ( float64x4  a)
inline

Extracts the higher half of a 256-bit vector.

  • This intrinsic results in at least 0 instructions.
int128 simdpp::sse::extract_lo ( int256  a)
inline

Extracts the lower half of a 256-bit vector.

  • This intrinsic results in at least 0 instructions.
basic_int8x16 simdpp::sse::extract_lo ( basic_int8x32  a)
inline

Extracts the lower half of a 256-bit vector.

  • This intrinsic results in at least 0 instructions.
basic_int16x8 simdpp::sse::extract_lo ( basic_int16x16  a)
inline

Extracts the lower half of a 256-bit vector.

  • This intrinsic results in at least 0 instructions.
basic_int32x4 simdpp::sse::extract_lo ( basic_int32x8  a)
inline

Extracts the lower half of a 256-bit vector.

  • This intrinsic results in at least 0 instructions.
basic_int64x2 simdpp::sse::extract_lo ( basic_int64x4  a)
inline

Extracts the lower half of a 256-bit vector.

  • This intrinsic results in at least 0 instructions.
float32x4 simdpp::sse::extract_lo ( float32x8  a)
inline

Extracts the lower half of a 256-bit vector.

  • This intrinsic results in at least 0 instructions.
float64x2 simdpp::sse::extract_lo ( float64x4  a)
inline

Extracts the lower half of a 256-bit vector.

  • This intrinsic results in at least 0 instructions.
float32x4 simdpp::sse::hadd2 ( float32x4  a,
float32x4  b 
)
inline

Adds the values in adjacent pairs of two float32x4 vectors.

r0 = a0 + a1
r1 = a2 + a3
r2 = b0 + b1
r3 = b2 + b3
  • Not implemented for SSE2.
float64x2 simdpp::sse::hadd2 ( float64x2  a,
float64x2  b 
)
inline

Adds the values in adjacent pairs of two float64x2 vectors.

r0 = a0 + a1
r1 = b0 + b1
  • Not implemented for SSE2.
int128 simdpp::sse::hadd2 ( basic_int16x8  a,
basic_int16x8  b 
)
inline

Adds values in adjacent pairs of two int16x8 vectors.

r0 = a0 + a1
...
r3 = a6 + a7
r4 = b0 + b1
...
r7 = b6 + b7
  • Not implemented for SSE2 and SSE3.
int128 simdpp::sse::hadd2 ( basic_int32x4  a,
basic_int32x4  b 
)
inline

Adds values in adjacent pairs of two int32x4 vectors.

r0 = a0 + a1
r1 = a2 + a3
r2 = b0 + b1
r3 = b2 + b3
  • Not implemented for SSE2 and SSE3.
int128 simdpp::sse::hadd2 ( basic_int64x2  a,
basic_int64x2  b 
)
inline

Adds values in adjacent pairs of two int64x2 vectors.

r0 = a0 + a1
r1 = b0 + b1
  • This intrinsic results in at least 3 instructions.
float32x4 simdpp::sse::hadd4 ( float32x4  a)
inline

Sums the values of a float32x4 vector.

r0 = a0 + a1 + a2 + a3
r1 = 0.0f
r2 = 0.0f
r3 = 0.0f
  • Not implemented for SSE2.
float32x4 simdpp::sse::hadd4 ( float32x4  a,
float32x4  b,
float32x4  c,
float32x4  d 
)
inline

Sums the values within each of four float32x4 vector.

r0 = a0 + a1 + a2 + a3
r1 = b0 + b1 + b2 + b3
r2 = c0 + c1 + c2 + c3
r3 = d0 + d1 + d2 + d3
  • In SSE3, SSSE3 and SSE4.1 this intrinsic results in at least 3 instructions.
  • Not implemented for SSE2.
int128 simdpp::sse::hadd4 ( basic_int32x4  a,
basic_int32x4  b,
basic_int32x4  c,
basic_int32x4  d 
)
inline

Sums the values within each of four int32x4 vector.

r0 = a0 + a1 + a2 + a3
r1 = b0 + b1 + b2 + b3
r2 = c0 + c1 + c2 + c3
r3 = d0 + d1 + d2 + d3
  • Not implemented for SSE2 and SSE3.
  • This intrinsic results in at least 3 instructions.
int128 simdpp::sse::hadds2 ( int16x8  a,
int16x8  b 
)
inline

Adds and saturates values in adjacent pairs of two signed int16x8 vectors.

r0 = signed_saturate(a0 + a1)
...
r3 = signed_saturate(a6 + a7)
r4 = signed_saturate(b0 + b1)
...
r7 = signed_saturate(b6 + b7)
  • Not implemented for SSE2 and SSE3.
float32x4 simdpp::sse::hsub2 ( float32x4  a,
float32x4  b 
)
inline

Subtracts the values in adjacent pairs of two float32x4 vectors.

r0 = a0 - a1
r1 = a2 - a3
r2 = b0 - b1
r3 = b2 - b3
  • Not implemented for SSE2.
float64x2 simdpp::sse::hsub2 ( float64x2  a,
float64x2  b 
)
inline

Subtracts the values in adjacent pairs of two float64x2 vectors.

r0 = a0 - a1
r1 = b0 - b1
  • Not implemented for SSE2.
int128 simdpp::sse::hsub2 ( basic_int16x8  a,
basic_int16x8  b 
)
inline

Subtracts values in adjacent pairs of two int16x8 vectors.

r0 = a0 - a1
...
r3 = a6 - a7
r4 = b0 - b1
...
r7 = b6 - b7
  • Not implemented for SSE2 and SSE3.
int128 simdpp::sse::hsub2 ( basic_int32x4  a,
basic_int32x4  b 
)
inline

Subtracts values in adjacent pairs of two int32x4 vectors.

r0 = a0 - a1
r1 = a2 - a3
r2 = b0 - b1
r3 = b2 - b3
  • Not implemented for SSE2 and SSE3.
int128 simdpp::sse::hsub2 ( basic_int64x2  a,
basic_int64x2  b 
)
inline

Subtracts values in adjacent pairs of two int64x2 vectors.

r0 = a0 - a1
r1 = b0 - b1
  • This intrinsic results in at least 3 instructions.
int128 simdpp::sse::hsubs2 ( int16x8  a,
int16x8  b 
)
inline

Subtracts and saturates values in adjacent pairs of two signed int16x8 vectors.

r0 = signed_saturate(a0 + a1)
...
r3 = signed_saturate(a6 + a7)
r4 = signed_saturate(b0 + b1)
...
r7 = signed_saturate(b6 + b7)
  • Not implemented for SSE2 and SSE3.
template<unsigned P, unsigned N>
void simdpp::sse::load_lane ( basic_int8x16 &  a,
const void *  p 
)

Loads the first N elements of a 128-bit vector from memory.

N must be a power of 2 and at least M/4 where M is the number of elements within vector. P must be 0 or M/2 if N == M/2.

If N is M/2, then the values of non-loaded elements are preserved, otherwise, they are set to zero.

template<unsigned P, unsigned N>
void simdpp::sse::load_lane ( basic_int16x8 &  a,
const void *  p 
)

Loads the first N elements of a 128-bit vector from memory.

N must be a power of 2 and at least M/4 where M is the number of elements within vector. P must be 0 or M/2 if N == M/2.

If N is M/2, then the values of non-loaded elements are preserved, otherwise, they are set to zero.

template<unsigned P, unsigned N>
void simdpp::sse::load_lane ( basic_int32x4 &  a,
const void *  p 
)

Loads the first N elements of a 128-bit vector from memory.

N must be a power of 2 and at least M/4 where M is the number of elements within vector. P must be 0 or M/2 if N == M/2.

If N is M/2, then the values of non-loaded elements are preserved, otherwise, they are set to zero.

template<unsigned P, unsigned N>
void simdpp::sse::load_lane ( basic_int64x2 &  a,
const void *  p 
)

Loads the first N elements of a 128-bit vector from memory.

N must be a power of 2 and at least M/4 where M is the number of elements within vector. P must be 0 or M/2 if N == M/2.

If N is M/2, then the values of non-loaded elements are preserved, otherwise, they are set to zero.

template<unsigned P, unsigned N>
void simdpp::sse::load_lane ( float32x4 &  a,
const float *  p 
)

Loads the first N elements of a 128-bit vector from memory.

N must be a power of 2 and at least M/4 where M is the number of elements within vector. P must be 0 or M/2 if N == M/2.

If N is M/2, then the values of non-loaded elements are preserved, otherwise, they are set to zero.

template<unsigned P, unsigned N>
float64x2 simdpp::sse::load_lane ( float64x2 &  a,
const double *  p 
)

Loads the first N elements of a 128-bit vector from memory.

N must be a power of 2 and at least M/4 where M is the number of elements within vector. P must be 0 or M/2 if N == M/2.

If N is M/2, then the values of non-loaded elements are preserved, otherwise, they are set to zero.

template<unsigned s0, unsigned s1, unsigned s2, unsigned s3>
basic_int16x8 simdpp::sse::permute_hi ( basic_int16x8  a)

Permutes the last 4 16-bit values in of each set of 8 consecutive valuees.

The selector values s0, s1, s2 and s3 must be in range [0; 3].

r0 = a0
...
r3 = a3
r4 = a[s0+4]
...
r7 = a[s3+4]
256-bit version:
r8 = a8
...
r11 = a11
r12 = a[s0+12]
...
r15 = a[s3+12]
256-bit version:
  • In SSE2-AVX this intrinsic results in at least 2 instructions.
template<unsigned s0, unsigned s1, unsigned s2, unsigned s3>
basic_int16x16 simdpp::sse::permute_hi ( basic_int16x16  a)

Permutes the last 4 16-bit values in of each set of 8 consecutive valuees.

The selector values s0, s1, s2 and s3 must be in range [0; 3].

r0 = a0
...
r3 = a3
r4 = a[s0+4]
...
r7 = a[s3+4]
256-bit version:
r8 = a8
...
r11 = a11
r12 = a[s0+12]
...
r15 = a[s3+12]
256-bit version:
  • In SSE2-AVX this intrinsic results in at least 2 instructions.
template<unsigned s0, unsigned s1, unsigned s2, unsigned s3>
basic_int16x8 simdpp::sse::permute_lo ( basic_int16x8  a)

Permutes the first 4 16-bit values in of each set of 8 consecutive valuees.

The selector values s0, s1, s2 and s3 must be in range [0; 3].

r0 = a[s0]
...
r3 = a[s3]
r4 = a4
...
r7 = a7
256-bit version:
r8 = a[s0+8]
...
r11 = a[s3+8]
r12 = a12
...
r15 = a15
256-bit version:
  • In SSE2-AVX this intrinsic results in at least 2 instructions.
template<unsigned s0, unsigned s1, unsigned s2, unsigned s3>
basic_int16x16 simdpp::sse::permute_lo ( basic_int16x16  a)

Permutes the first 4 16-bit values in of each set of 8 consecutive valuees.

The selector values s0, s1, s2 and s3 must be in range [0; 3].

r0 = a[s0]
...
r3 = a[s3]
r4 = a4
...
r7 = a7
256-bit version:
r8 = a[s0+8]
...
r11 = a[s3+8]
r12 = a12
...
r15 = a15
256-bit version:
  • In SSE2-AVX this intrinsic results in at least 2 instructions.
template<unsigned P, unsigned N>
void simdpp::sse::store_lane ( void *  p,
basic_int8x16  a 
)

Stores the first N elements of a 128-bit vector to memory.

N must be a power of 2 and at least M/4 where M is the number of elements within vector. P must be 0 or M/2 if N == M/2.

template<unsigned P, unsigned N>
void simdpp::sse::store_lane ( void *  p,
basic_int16x8  a 
)

Stores the first N elements of a 128-bit vector to memory.

N must be a power of 2 and at least M/4 where M is the number of elements within vector. P must be 0 or M/2 if N == M/2.

template<unsigned P, unsigned N>
void simdpp::sse::store_lane ( void *  p,
basic_int32x4  a 
)

Stores the first N elements of a 128-bit vector to memory.

N must be a power of 2 and at least M/4 where M is the number of elements within vector. P must be 0 or M/2 if N == M/2.

template<unsigned P, unsigned N>
void simdpp::sse::store_lane ( void *  p,
basic_int64x2  a 
)

Stores the first N elements of a 128-bit vector to memory.

N must be a power of 2 and at least M/4 where M is the number of elements within vector. P must be 0 or M/2 if N == M/2.

template<unsigned P, unsigned N>
void simdpp::sse::store_lane ( float *  p,
float32x4  a 
)

Stores the first N elements of a 128-bit vector to memory.

N must be a power of 2 and at least M/4 where M is the number of elements within vector. P must be 0 or M/2 if N == M/2.

template<unsigned P, unsigned N>
void simdpp::sse::store_lane ( double *  p,
float64x2  a 
)

Stores the first N elements of a 128-bit vector to memory.

N must be a power of 2 and at least M/4 where M is the number of elements within vector. P must be 0 or M/2 if N == M/2.

void simdpp::sse::store_masked ( void *  p,
int128  a,
int128  mask 
)
inline

Stores bytes in an 128-bit integer vector according to a mask.

The highest bit in the corresponding byte in the mask defines whether the byte will be saved. p does not need to be aligned to 16 bytes.

float32x4 simdpp::sse::sub_add ( float32x4  a,
float32x4  b 
)
inline

Adds or substracts the values of two float32x4 vectors.

r0 = a0 - b0
r1 = a1 + b1
r2 = a2 - b2
r3 = a3 + b3
  • Not implemented for SSE2.
float64x2 simdpp::sse::sub_add ( float64x2  a,
float64x2  b 
)
inline

Adds or subtracts the values of two float64x2 vectors.

r0 = a0 - b0
r1 = a1 + b1
  • Not implemented for SSE2.
basic_int32x4 simdpp::sse::to_int32x4 ( int8x16  a)
inline

Sign extends the values of a signed int8x16 vector to 32-bits.

r0 = (int32_t) a0
...
r3 = (int32_t) a3
  • In SSE2, SSE3 and SSSE3 this intrinsic results in at least 4 instructions.
basic_int32x4 simdpp::sse::to_int32x4 ( uint8x16  a)
inline

Extends the values of a unsigned int8x16 vector to 32-bits.

r0 = (uint32_t) a0
...
r3 = (uint32_t) a3
  • In SSE2, SSE3 and SSSE3 this intrinsic results in at least 3 instructions.
basic_int32x4 simdpp::sse::to_int32x4 ( float64x2  a)
inline

Converts the values of a float64x2 vector into int32_t representation using truncation.

If the value can not be represented by int32_t, 0x80000000 is returned

r0 = (int32_t) a0
r1 = (int32_t) a1
r2 = 0
r3 = 0
basic_int32x4 simdpp::sse::to_int32x4_r ( float32x4  a)
inline

Converts the values of a float32x4 vector into signed int32_t representation.

If the value can not be represented by int32_t, 0x80000000 is returned If only inexact conversion can be performed, the current rounding mode is used.

r0 = (int32_t) a0
r1 = (int32_t) a1
r2 = (int32_t) a2
r3 = (int32_t) a3
basic_int32x4 simdpp::sse::to_int32x4_r ( float64x2  a)
inline

Converts the values of a float64x2 vector into int32_t representation.

If the value can not be represented by int32_t, 0x80000000 is returned If only inexact conversion can be performed, it is rounded according to the current rounding mode.

r0 = (int32_t) a0
r1 = (int32_t) a1
r2 = 0
r3 = 0
basic_int64x2 simdpp::sse::to_int64x2 ( int8x16  a)
inline

Sign extends the values of a signed int8x16 vector to 64-bits.

r0 = (int64_t) a0
r1 = (int64_t) a1
  • Not implemented for SSE2, SSE3 and SSSE3.
basic_int64x2 simdpp::sse::to_int64x2 ( int16x8  a)
inline

Sign extends the values of a signed int16x8 vector to 64-bits.

r0 = (int64_t) a0
r1 = (int64_t) a1
  • Not implemented for SSE2, SSE3 and SSSE3.
basic_int64x2 simdpp::sse::to_int64x2 ( uint8x16  a)
inline

Extends the values of a unsigned int8x16 vector to 64-bits.

r0 = (uint64_t) a0
r1 = (uint64_t) a1
  • In SSE2, SSE3 and SSSE3 this intrinsic results in at least 4 instructions.