libsimdpp  0.9.3
Operations: insert or extract a single element from a vector

Functions

template<unsigned id>
basic_int8x16 simdpp::insert (basic_int8x16 a, uint8_t x)
 Inserts an element into int8x16 vector at the position identified by id. More...
 
template<unsigned id>
basic_int16x8 simdpp::insert (basic_int16x8 a, uint16_t x)
 Inserts an element into int16x8 vector at the position identified by id. More...
 
template<unsigned id>
basic_int32x4 simdpp::insert (basic_int32x4 a, uint32_t x)
 Inserts an element into int32x4 vector at the position identified by id. More...
 
template<unsigned id>
basic_int64x2 simdpp::insert (basic_int64x2 a, uint64_t x)
 Inserts an element into int64x2 vector at the position identified by id. More...
 
template<unsigned id>
float32x4 simdpp::insert (float32x4 a, float x)
 Inserts an element into float32x4 vector at the position identified by id. More...
 
template<unsigned id>
float64x2 simdpp::insert (float64x2 a, double x)
 Inserts an element into float64x2 vector at the position identified by id. More...
 
template<unsigned id>
uint8_t simdpp::extract (basic_int8x16 a)
 Extracts the id-th element from int8x16 vector. More...
 
template<unsigned id>
int8_t simdpp::extract (int8x16 a)
 Extracts the id-th element from int8x16 vector. More...
 
int256 simdpp::combine (int128 a, int128 b)
 Combines two 128-bit vectors into a 256-bit vector. More...
 
float32x8 simdpp::combine (float32x4 a, float32x4 b)
 Combines two 128-bit vectors into a 256-bit vector. More...
 
float64x4 simdpp::combine (float64x2 a, float64x2 b)
 Combines two 128-bit vectors into a 256-bit vector. More...
 

Detailed Description

Function Documentation

int256 simdpp::combine ( int128  a,
int128  b 
)
inline

Combines two 128-bit vectors into a 256-bit vector.

r = [ a, b ]
  • In AVX2 this intrinsic results in at least 1 instructions.
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 0 instructions.
float32x8 simdpp::combine ( float32x4  a,
float32x4  b 
)
inline

Combines two 128-bit vectors into a 256-bit vector.

r = [ a, b ]
  • In AVX2 this intrinsic results in at least 1 instructions.
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 0 instructions.
float64x4 simdpp::combine ( float64x2  a,
float64x2  b 
)
inline

Combines two 128-bit vectors into a 256-bit vector.

r = [ a, b ]
  • In AVX2 this intrinsic results in at least 1 instructions.
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 0 instructions.
template<unsigned id>
uint8_t simdpp::extract ( basic_int8x16  a)

Extracts the id-th element from int8x16 vector.

r = a[id]

This function may have very high latency.

  • In SSE2-SSSE3 this intrinsic results in at least 1-2 instructions.
  • In SSE4.1-AVX this intrinsic results in at least 1 instructions.
  • In ALTIVEC this intrinsic results in at least 2 instructions.
template<unsigned id>
int8_t simdpp::extract ( int8x16  a)

Extracts the id-th element from int8x16 vector.

r = a[id]

This function may have very high latency.

  • In SSE2-SSSE3 this intrinsic results in at least 1-2 instructions.
  • In SSE4.1-AVX this intrinsic results in at least 1 instructions.
  • In ALTIVEC this intrinsic results in at least 2 instructions.
template<unsigned id>
basic_int8x16 simdpp::insert ( basic_int8x16  a,
uint8_t  x 
)

Inserts an element into int8x16 vector at the position identified by id.

r0 = (id == 0) ? x : a0
...
r15 = (id == 15) ? x : a15

This function may have very high latency.

  • In SSE2-SSSE3 this intrinsic results in at least 4-5 instructions.
  • In ALTIVEC this intrinsic results in at least 3 instructions.
template<unsigned id>
basic_int16x8 simdpp::insert ( basic_int16x8  a,
uint16_t  x 
)

Inserts an element into int16x8 vector at the position identified by id.

r0 = (id == 0) ? x : a0
...
r7 = (id == 7) ? x : a7

This function may have very high latency.

  • In ALTIVEC this intrinsic results in at least 3 instructions.
template<unsigned id>
basic_int32x4 simdpp::insert ( basic_int32x4  a,
uint32_t  x 
)

Inserts an element into int32x4 vector at the position identified by id.

r0 = (id == 0) ? x : a0
r1 = (id == 1) ? x : a1
r2 = (id == 2) ? x : a2
r3 = (id == 3) ? x : a3

This function may have very high latency.

  • In SSE2-SSSE3 this intrinsic results in at least 4 instructions.
  • In ALTIVEC this intrinsic results in at least 3 instructions.
template<unsigned id>
basic_int64x2 simdpp::insert ( basic_int64x2  a,
uint64_t  x 
)

Inserts an element into int64x2 vector at the position identified by id.

r0 = (id == 0) ? x : a0
r1 = (id == 1) ? x : a1

This function may have very high latency.

  • In SSE2, SSE3 and SSSE3 this intrinsic results in at least 2 instructions.
  • In SSE4_1 this intrinsic results in at least 1 instructions.
  • In SSE2_32bit, SSE3_32bit and SSSE3_32bit this intrinsic results in at least 4 instructions.
  • In SSE4_1_32bit this intrinsic results in at least 2 instructions.
  • In ALTIVEC this intrinsic results in at least 3 instructions.
template<unsigned id>
float32x4 simdpp::insert ( float32x4  a,
float  x 
)

Inserts an element into float32x4 vector at the position identified by id.

r0 = (id == 0) ? x : a0
r1 = (id == 1) ? x : a1
r2 = (id == 2) ? x : a2
r3 = (id == 3) ? x : a3

This function may have very high latency.

  • In SSE2-SSSE3 this intrinsic results in at least 4 instructions.
  • In ALTIVEC this intrinsic results in at least 3 instructions.
template<unsigned id>
float64x2 simdpp::insert ( float64x2  a,
double  x 
)

Inserts an element into float64x2 vector at the position identified by id.

This function potentially

r0 = (id == 0) ? x : a0
r1 = (id == 1) ? x : a1

This function may have very high latency.

  • In SSE2-SSSE3 this intrinsic results in at least 2 instructions.
  • In ALTIVEC this intrinsic results in at least 3 instructions.