libsimdpp  0.9.3
Operations: integer maths

Functions

basic_int8x16 simdpp::add (basic_int8x16 a, basic_int8x16 b)
 Adds 8-bit integer values. More...
 
basic_int8x32 simdpp::add (basic_int8x32 a, basic_int8x32 b)
 
int8x16 simdpp::min (int8x16 a, int8x16 b)
 Computes minimum of signed 8-bit values. More...
 
int8x32 simdpp::min (int8x32 a, int8x32 b)
 Computes minimum of signed 8-bit values. More...
 
basic_int16x8 simdpp::add (basic_int16x8 a, basic_int16x8 b)
 Adds 16-bit integer values. More...
 
basic_int16x16 simdpp::add (basic_int16x16 a, basic_int16x16 b)
 Adds 16-bit integer values. More...
 
basic_int32x4 simdpp::add (basic_int32x4 a, basic_int32x4 b)
 Adds 32-bit integer values. More...
 
basic_int32x8 simdpp::add (basic_int32x8 a, basic_int32x8 b)
 Adds 32-bit integer values. More...
 
basic_int64x2 simdpp::add (basic_int64x2 a, basic_int64x2 b)
 Adds 64-bit integer values. More...
 
basic_int64x4 simdpp::add (basic_int64x4 a, basic_int64x4 b)
 Adds 64-bit integer values. More...
 
int8x16 simdpp::adds (int8x16 a, int8x16 b)
 Adds and saturates signed 8-bit integer values. More...
 
int8x32 simdpp::adds (int8x32 a, int8x32 b)
 Adds and saturates signed 8-bit integer values. More...
 
int16x8 simdpp::adds (int16x8 a, int16x8 b)
 Adds and saturates signed 16-bit integer values. More...
 
int16x16 simdpp::adds (int16x16 a, int16x16 b)
 Adds and saturates signed 16-bit integer values. More...
 
uint8x16 simdpp::adds (uint8x16 a, uint8x16 b)
 Adds and saturates unsigned 8-bit integer values. More...
 
uint8x32 simdpp::adds (uint8x32 a, uint8x32 b)
 Adds and saturates unsigned 8-bit integer values. More...
 
uint16x8 simdpp::adds (uint16x8 a, uint16x8 b)
 Adds and saturates unsigned 16-bit integer values. More...
 
uint16x16 simdpp::adds (uint16x16 a, uint16x16 b)
 Adds and saturates unsigned 16-bit integer values. More...
 
basic_int8x16 simdpp::sub (basic_int8x16 a, basic_int8x16 b)
 Subtracts 8-bit integer values. More...
 
basic_int8x32 simdpp::sub (basic_int8x32 a, basic_int8x32 b)
 Subtracts 8-bit integer values. More...
 
basic_int16x8 simdpp::sub (basic_int16x8 a, basic_int16x8 b)
 Subtracts 16-bit integer values. More...
 
basic_int16x16 simdpp::sub (basic_int16x16 a, basic_int16x16 b)
 Subtracts 16-bit integer values. More...
 
basic_int32x4 simdpp::sub (basic_int32x4 a, basic_int32x4 b)
 Subtracts 32-bit integer values. More...
 
basic_int32x8 simdpp::sub (basic_int32x8 a, basic_int32x8 b)
 Subtracts 32-bit integer values. More...
 
basic_int64x2 simdpp::sub (basic_int64x2 a, basic_int64x2 b)
 Subtracts 64-bit integer values. More...
 
basic_int64x4 simdpp::sub (basic_int64x4 a, basic_int64x4 b)
 Subtracts 64-bit integer values. More...
 
int8x16 simdpp::subs (int8x16 a, int8x16 b)
 Subtracts and saturaters signed 8-bit integer values. More...
 
int8x32 simdpp::subs (int8x32 a, int8x32 b)
 Subtracts and saturaters signed 8-bit integer values. More...
 
int16x8 simdpp::subs (int16x8 a, int16x8 b)
 Subtracts and saturaters signed 16-bit integer values. More...
 
int16x16 simdpp::subs (int16x16 a, int16x16 b)
 Subtracts and saturaters signed 16-bit integer values. More...
 
uint8x16 simdpp::subs (uint8x16 a, uint8x16 b)
 Subtracts and saturaters unsigned 8-bit integer values. More...
 
uint8x32 simdpp::subs (uint8x32 a, uint8x32 b)
 Subtracts and saturaters unsigned 8-bit integer values. More...
 
uint16x8 simdpp::subs (uint16x8 a, uint16x8 b)
 Subtracts and saturaters unsigned 16-bit integer values. More...
 
uint16x16 simdpp::subs (uint16x16 a, uint16x16 b)
 Subtracts and saturaters unsigned 16-bit integer values. More...
 
int8x16 simdpp::neg (int8x16 a)
 Negates signed 8-bit values. More...
 
int8x32 simdpp::neg (int8x32 a)
 Negates signed 8-bit values. More...
 
int16x8 simdpp::neg (int16x8 a)
 Negates signed 16-bit values. More...
 
int16x16 simdpp::neg (int16x16 a)
 Negates signed 16-bit values. More...
 
int32x4 simdpp::neg (int32x4 a)
 Negates signed 32-bit values. More...
 
int32x8 simdpp::neg (int32x8 a)
 Negates signed 32-bit values. More...
 
int64x2 simdpp::neg (int64x2 a)
 Negates signed 64-bit values. More...
 
int64x4 simdpp::neg (int64x4 a)
 Negates signed 64-bit values. More...
 
basic_int16x8 simdpp::mul_lo (basic_int16x8 a, basic_int16x8 b)
 Multiplies 16-bit values and returns the lower part of the multiplication. More...
 
basic_int16x16 simdpp::mul_lo (basic_int16x16 a, basic_int16x16 b)
 Multiplies 16-bit values and returns the lower part of the multiplication. More...
 
int16x8 simdpp::mul_hi (int16x8 a, int16x8 b)
 Multiplies signed 16-bit values and returns the higher half of the result. More...
 
int16x16 simdpp::mul_hi (int16x16 a, int16x16 b)
 Multiplies signed 16-bit values and returns the higher half of the result. More...
 
uint16x8 simdpp::mul_hi (uint16x8 a, uint16x8 b)
 Multiplies unsigned 16-bit values and returns the higher half of the result. More...
 
uint16x16 simdpp::mul_hi (uint16x16 a, uint16x16 b)
 Multiplies unsigned 16-bit values and returns the higher half of the result. More...
 
int128 simdpp::mul_lo (basic_int32x4 a, basic_int32x4 b)
 Multiplies 32-bit values and returns the lower half of the result. More...
 
basic_int32x8 simdpp::mul_lo (basic_int32x8 a, basic_int32x8 b)
 Multiplies 32-bit values and returns the lower half of the result. More...
 
int32x4 simdpp::mull_lo (int16x8 a, int16x8 b)
 Multiplies signed 16-bit values in the lower halves of the vectors and expands the results to 32 bits. More...
 
int32x8 simdpp::mull_lo (int16x16 a, int16x16 b)
 Multiplies signed 16-bit values in the lower halves of the vectors and expands the results to 32 bits. More...
 
uint32x4 simdpp::mull_lo (uint16x8 a, uint16x8 b)
 Multiplies unsigned 16-bit values in the lower halves of the vectors and expands the results to 32 bits. More...
 
uint32x8 simdpp::mull_lo (uint16x16 a, uint16x16 b)
 Multiplies unsigned 16-bit values in the lower halves of the vectors and expands the results to 32 bits. More...
 
int32x4 simdpp::mull_hi (int16x8 a, int16x8 b)
 Multiplies signed 16-bit values in the higher halves of the vectors and expands the results to 32 bits. More...
 
int32x8 simdpp::mull_hi (int16x16 a, int16x16 b)
 Multiplies signed 16-bit values in the higher halves of the vectors and expands the results to 32 bits. More...
 
uint32x4 simdpp::mull_hi (uint16x8 a, uint16x8 b)
 Multiplies unsigned 16-bit values in the higher halves of the vectors and expands the results to 32 bits. More...
 
uint32x8 simdpp::mull_hi (uint16x16 a, uint16x16 b)
 Multiplies unsigned 16-bit values in the higher halves of the vectors and expands the results to 32 bits. More...
 
int64x2 simdpp::mull_lo (int32x4 a, int32x4 b)
 Multiplies signed 32-bit values in the lower halves of the vectors and expands the results to 64 bits. More...
 
int64x4 simdpp::mull_lo (int32x8 a, int32x8 b)
 Multiplies signed 32-bit values in the lower halves of the vectors and expands the results to 64 bits. More...
 
uint64x2 simdpp::mull_lo (uint32x4 a, uint32x4 b)
 Multiplies unsigned 32-bit values in the lower halves of the vectors and expands the results to 64 bits. More...
 
uint64x4 simdpp::mull_lo (uint32x8 a, uint32x8 b)
 Multiplies unsigned 32-bit values in the lower halves of the vectors and expands the results to 64 bits. More...
 
int64x2 simdpp::mull_hi (int32x4 a, int32x4 b)
 Multiplies signed 32-bit values in the higher halves of the vectors and expands the results to 64 bits. More...
 
int64x4 simdpp::mull_hi (int32x8 a, int32x8 b)
 Multiplies signed 32-bit values in the higher halves of the vectors and expands the results to 64 bits. More...
 
uint64x2 simdpp::mull_hi (uint32x4 a, uint32x4 b)
 Multiplies unsigned 32-bit values in the higher halves of the vectors and expands the results to 64 bits. More...
 
uint64x4 simdpp::mull_hi (uint32x8 a, uint32x8 b)
 Multiplies unsigned 32-bit values in the higher halves of the vectors and expands the results to 64 bits. More...
 

Detailed Description

Function Documentation

basic_int8x16 simdpp::add ( basic_int8x16  a,
basic_int8x16  b 
)
inline

Adds 8-bit integer values.

r0 = a0 + b0
...
rN = aN + bN
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int8x32 simdpp::add ( basic_int8x32  a,
basic_int8x32  b 
)
inline
basic_int16x8 simdpp::add ( basic_int16x8  a,
basic_int16x8  b 
)
inline

Adds 16-bit integer values.

r0 = a0 + b0
...
rN = aN + bN
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int16x16 simdpp::add ( basic_int16x16  a,
basic_int16x16  b 
)
inline

Adds 16-bit integer values.

r0 = a0 + b0
...
rN = aN + bN
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int32x4 simdpp::add ( basic_int32x4  a,
basic_int32x4  b 
)
inline

Adds 32-bit integer values.

r0 = a0 + b0
...
rN = aN + bN
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int32x8 simdpp::add ( basic_int32x8  a,
basic_int32x8  b 
)
inline

Adds 32-bit integer values.

r0 = a0 + b0
...
rN = aN + bN
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int64x2 simdpp::add ( basic_int64x2  a,
basic_int64x2  b 
)
inline

Adds 64-bit integer values.

r0 = a0 + b0
...
rN = aN + bN
128-bit version:
  • In ALTIVEC this intrinsic results in at least 5-6 instructions.
256-bit version:
  • In SSE2-AVX and NEON this intrinsic results in at least 2 instructions.
  • In ALTIVEC this intrinsic results in at least 10-11 instructions.
basic_int64x4 simdpp::add ( basic_int64x4  a,
basic_int64x4  b 
)
inline

Adds 64-bit integer values.

r0 = a0 + b0
...
rN = aN + bN
128-bit version:
  • In ALTIVEC this intrinsic results in at least 5-6 instructions.
256-bit version:
  • In SSE2-AVX and NEON this intrinsic results in at least 2 instructions.
  • In ALTIVEC this intrinsic results in at least 10-11 instructions.
int8x16 simdpp::adds ( int8x16  a,
int8x16  b 
)
inline

Adds and saturates signed 8-bit integer values.

r0 = signed_saturate(a0 + b0)
...
rN = signed_saturate(aN + bN)
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
int8x32 simdpp::adds ( int8x32  a,
int8x32  b 
)
inline

Adds and saturates signed 8-bit integer values.

r0 = signed_saturate(a0 + b0)
...
rN = signed_saturate(aN + bN)
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
int16x8 simdpp::adds ( int16x8  a,
int16x8  b 
)
inline

Adds and saturates signed 16-bit integer values.

r0 = signed_saturate(a0 + b0)
...
rN = signed_saturate(aN + bN)
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
int16x16 simdpp::adds ( int16x16  a,
int16x16  b 
)
inline

Adds and saturates signed 16-bit integer values.

r0 = signed_saturate(a0 + b0)
...
rN = signed_saturate(aN + bN)
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
uint8x16 simdpp::adds ( uint8x16  a,
uint8x16  b 
)
inline

Adds and saturates unsigned 8-bit integer values.

r0 = unsigned_saturate(a0 + b0)
...
rN = unsigned_saturate(aN + bN)
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
uint8x32 simdpp::adds ( uint8x32  a,
uint8x32  b 
)
inline

Adds and saturates unsigned 8-bit integer values.

r0 = unsigned_saturate(a0 + b0)
...
rN = unsigned_saturate(aN + bN)
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
uint16x8 simdpp::adds ( uint16x8  a,
uint16x8  b 
)
inline

Adds and saturates unsigned 16-bit integer values.

r0 = unsigned_saturate(a0 + b0)
...
rN = unsigned_saturate(aN + bN)
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
uint16x16 simdpp::adds ( uint16x16  a,
uint16x16  b 
)
inline

Adds and saturates unsigned 16-bit integer values.

r0 = unsigned_saturate(a0 + b0)
...
rN = unsigned_saturate(aN + bN)
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
int8x16 simdpp::min ( int8x16  a,
int8x16  b 
)
inline

Computes minimum of signed 8-bit values.

r0 = min(a0, b0)
...
rN = min(aN, bN)
128-bit version:
  • In SSE2-SSSE3 this intrinsic results in at least 4 instructions.
256-bit version:
  • In SSE2-SSSE3 this intrinsic results in at least 8 instructions.
  • In SSE4.1-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
int8x32 simdpp::min ( int8x32  a,
int8x32  b 
)
inline

Computes minimum of signed 8-bit values.

r0 = min(a0, b0)
...
rN = min(aN, bN)
128-bit version:
  • In SSE2-SSSE3 this intrinsic results in at least 4 instructions.
256-bit version:
  • In SSE2-SSSE3 this intrinsic results in at least 8 instructions.
  • In SSE4.1-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
int16x8 simdpp::mul_hi ( int16x8  a,
int16x8  b 
)
inline

Multiplies signed 16-bit values and returns the higher half of the result.

r0 = high(a0 * b0)
...
rN = high(aN * bN)
128-bit version:
  • In NEON and ALTIVEC this intrinsic results in at least 3 instructions.
256-bit version:
  • In SSE2-AVX this intrinsic results in at least 2 instructions.
  • In NEON and ALTIVEC this intrinsic results in at least 6 instructions.
int16x16 simdpp::mul_hi ( int16x16  a,
int16x16  b 
)
inline

Multiplies signed 16-bit values and returns the higher half of the result.

r0 = high(a0 * b0)
...
rN = high(aN * bN)
128-bit version:
  • In NEON and ALTIVEC this intrinsic results in at least 3 instructions.
256-bit version:
  • In SSE2-AVX this intrinsic results in at least 2 instructions.
  • In NEON and ALTIVEC this intrinsic results in at least 6 instructions.
uint16x8 simdpp::mul_hi ( uint16x8  a,
uint16x8  b 
)
inline

Multiplies unsigned 16-bit values and returns the higher half of the result.

r0 = high(a0 * b0)
...
rN = high(aN * bN)
128-bit version:
  • In NEON and ALTIVEC this intrinsic results in at least 3 instructions.
256-bit version:
  • In SSE2-AVX this intrinsic results in at least 2 instructions.
  • In NEON and ALTIVEC this intrinsic results in at least 6 instructions.
uint16x16 simdpp::mul_hi ( uint16x16  a,
uint16x16  b 
)
inline

Multiplies unsigned 16-bit values and returns the higher half of the result.

r0 = high(a0 * b0)
...
rN = high(aN * bN)
128-bit version:
  • In NEON and ALTIVEC this intrinsic results in at least 3 instructions.
256-bit version:
  • In SSE2-AVX this intrinsic results in at least 2 instructions.
  • In NEON and ALTIVEC this intrinsic results in at least 6 instructions.
basic_int16x8 simdpp::mul_lo ( basic_int16x8  a,
basic_int16x8  b 
)
inline

Multiplies 16-bit values and returns the lower part of the multiplication.

r0 = low(a0 * b0)
...
rN = low(aN * bN)
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int16x16 simdpp::mul_lo ( basic_int16x16  a,
basic_int16x16  b 
)
inline

Multiplies 16-bit values and returns the lower part of the multiplication.

r0 = low(a0 * b0)
...
rN = low(aN * bN)
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
int128 simdpp::mul_lo ( basic_int32x4  a,
basic_int32x4  b 
)
inline

Multiplies 32-bit values and returns the lower half of the result.

r0 = low(a0 * b0)
...
rN = low(aN * bN)
128-bit version:
  • In SSE2-SSSE3 this intrinsic results in at least 6 instructions.
  • In ALTIVEC this intrinsic results in at least 8 instructions.
256-bit version:
  • In SSE2-SSSE3 this intrinsic results in at least 12 instructions.
  • In SSE4.1, AVX and NEON this intrinsic results in at least 2 instructions.
  • In ALTIVEC this intrinsic results in at least 16 instructions.
basic_int32x8 simdpp::mul_lo ( basic_int32x8  a,
basic_int32x8  b 
)
inline

Multiplies 32-bit values and returns the lower half of the result.

r0 = low(a0 * b0)
...
rN = low(aN * bN)
128-bit version:
  • In SSE2-SSSE3 this intrinsic results in at least 6 instructions.
  • In ALTIVEC this intrinsic results in at least 8 instructions.
256-bit version:
  • In SSE2-SSSE3 this intrinsic results in at least 12 instructions.
  • In SSE4.1, AVX and NEON this intrinsic results in at least 2 instructions.
  • In ALTIVEC this intrinsic results in at least 16 instructions.
int32x4 simdpp::mull_hi ( int16x8  a,
int16x8  b 
)
inline

Multiplies signed 16-bit values in the higher halves of the vectors and expands the results to 32 bits.

128-bit version:
r0 = a4 * b4
...
r3 = a7 * b7
  • In SSE2-AVX2 and ALTIVEC this intrinsic results in at least 2-3 instructions.
256-bit version:
The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
  • In SSE2-AVX and ALTIVEC this intrinsic results in at least 4-6 instructions.
  • In AVX2 this intrinsic results in at least 2-3 instructions.
  • In NEON this intrinsic results in at least 2 instructions.
    Note
    Use with mull_lo on the same arguments to save instructions.
int32x8 simdpp::mull_hi ( int16x16  a,
int16x16  b 
)
inline

Multiplies signed 16-bit values in the higher halves of the vectors and expands the results to 32 bits.

128-bit version:
r0 = a4 * b4
...
r3 = a7 * b7
  • In SSE2-AVX2 and ALTIVEC this intrinsic results in at least 2-3 instructions.
256-bit version:
The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
  • In SSE2-AVX and ALTIVEC this intrinsic results in at least 4-6 instructions.
  • In AVX2 this intrinsic results in at least 2-3 instructions.
  • In NEON this intrinsic results in at least 2 instructions.
    Note
    Use with mull_lo on the same arguments to save instructions.
uint32x4 simdpp::mull_hi ( uint16x8  a,
uint16x8  b 
)
inline

Multiplies unsigned 16-bit values in the higher halves of the vectors and expands the results to 32 bits.

128-bit version:
r0 = a4 * b4
...
r3 = a7 * b7
  • In SSE2-AVX2 and ALTIVEC this intrinsic results in at least 2-3 instructions.
256-bit version:
The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
  • In SSE2-AVX and ALTIVEC this intrinsic results in at least 4-6 instructions.
  • In AVX2 this intrinsic results in at least 2-3 instructions.
  • In NEON this intrinsic results in at least 2 instructions.
    Note
    Use with mull_lo on the same arguments to save instructions.
uint32x8 simdpp::mull_hi ( uint16x16  a,
uint16x16  b 
)
inline

Multiplies unsigned 16-bit values in the higher halves of the vectors and expands the results to 32 bits.

128-bit version:
r0 = a4 * b4
...
r3 = a7 * b7
  • In SSE2-AVX2 and ALTIVEC this intrinsic results in at least 2-3 instructions.
256-bit version:
The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
  • In SSE2-AVX and ALTIVEC this intrinsic results in at least 4-6 instructions.
  • In AVX2 this intrinsic results in at least 2-3 instructions.
  • In NEON this intrinsic results in at least 2 instructions.
    Note
    Use with mull_lo on the same arguments to save instructions.
int64x2 simdpp::mull_hi ( int32x4  a,
int32x4  b 
)
inline

Multiplies signed 32-bit values in the higher halves of the vectors and expands the results to 64 bits.

128-bit version:
r0 = a2 * b2
r1 = a3 * b3
  • In SSE4.1-AVX2 this intrinsic results in at least 3 instructions.
  • Not implemented for SSE2-SSSE3 and ALTIVEC.
256-bit version:
The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
  • In SSE4.1-AVX this intrinsic results in at least 6 instructions.
  • In AVX2 this intrinsic results in at least 3 instructions.
  • In NEON this intrinsic results in at least 2 instructions.
  • Not implemented for SSE2-SSSE3 and ALTIVEC.
int64x4 simdpp::mull_hi ( int32x8  a,
int32x8  b 
)
inline

Multiplies signed 32-bit values in the higher halves of the vectors and expands the results to 64 bits.

128-bit version:
r0 = a2 * b2
r1 = a3 * b3
  • In SSE4.1-AVX2 this intrinsic results in at least 3 instructions.
  • Not implemented for SSE2-SSSE3 and ALTIVEC.
256-bit version:
The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
  • In SSE4.1-AVX this intrinsic results in at least 6 instructions.
  • In AVX2 this intrinsic results in at least 3 instructions.
  • In NEON this intrinsic results in at least 2 instructions.
  • Not implemented for SSE2-SSSE3 and ALTIVEC.
uint64x2 simdpp::mull_hi ( uint32x4  a,
uint32x4  b 
)
inline

Multiplies unsigned 32-bit values in the higher halves of the vectors and expands the results to 64 bits.

128-bit version:
r0 = a2 * b2
r1 = a3 * b3
  • In SSE2-AVX this intrinsic results in at least 3 instructions.
  • Not vectorized in ALTIVEC.
256-bit version:
The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
  • In SSE2-AVX this intrinsic results in at least 6 instructions.
  • In AVX2 this intrinsic results in at least 3 instructions.
  • In NEON this intrinsic results in at least 2 instructions.
  • Not vectorized in ALTIVEC.
uint64x4 simdpp::mull_hi ( uint32x8  a,
uint32x8  b 
)
inline

Multiplies unsigned 32-bit values in the higher halves of the vectors and expands the results to 64 bits.

128-bit version:
r0 = a2 * b2
r1 = a3 * b3
  • In SSE2-AVX this intrinsic results in at least 3 instructions.
  • Not vectorized in ALTIVEC.
256-bit version:
The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
  • In SSE2-AVX this intrinsic results in at least 6 instructions.
  • In AVX2 this intrinsic results in at least 3 instructions.
  • In NEON this intrinsic results in at least 2 instructions.
  • Not vectorized in ALTIVEC.
int32x4 simdpp::mull_lo ( int16x8  a,
int16x8  b 
)
inline

Multiplies signed 16-bit values in the lower halves of the vectors and expands the results to 32 bits.

128-bit version:
r0 = a0 * b0
...
r3 = a3 * b3
  • In SSE2-AVX and ALTIVEC this intrinsic results in at least 2-3 instructions.
256-bit version:

The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.

  • In SSE2-AVX and ALTIVEC this intrinsic results in at least 4-6 instructions.
  • In AVX2 and NEON this intrinsic results in at least 2-3 instructions.
    Note
    Use with mull_hi on the same arguments to save instructions.
int32x8 simdpp::mull_lo ( int16x16  a,
int16x16  b 
)
inline

Multiplies signed 16-bit values in the lower halves of the vectors and expands the results to 32 bits.

128-bit version:
r0 = a0 * b0
...
r3 = a3 * b3
  • In SSE2-AVX and ALTIVEC this intrinsic results in at least 2-3 instructions.
256-bit version:

The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.

  • In SSE2-AVX and ALTIVEC this intrinsic results in at least 4-6 instructions.
  • In AVX2 and NEON this intrinsic results in at least 2-3 instructions.
    Note
    Use with mull_hi on the same arguments to save instructions.
uint32x4 simdpp::mull_lo ( uint16x8  a,
uint16x8  b 
)
inline

Multiplies unsigned 16-bit values in the lower halves of the vectors and expands the results to 32 bits.

128-bit version:
r0 = a0 * b0
...
r3 = a3 * b3
  • In SSE2-AVX2 and ALTIVEC this intrinsic results in at least 2-3 instructions.
256-bit version:
The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
  • In SSE2-AVX and ALTIVEC this intrinsic results in at least 4-6 instructions.
  • In AVX2 this intrinsic results in at least 2-3 instructions.
  • In NEON this intrinsic results in at least 2 instructions.
    Note
    Use with mull_hi on the same arguments to save instructions.
uint32x8 simdpp::mull_lo ( uint16x16  a,
uint16x16  b 
)
inline

Multiplies unsigned 16-bit values in the lower halves of the vectors and expands the results to 32 bits.

128-bit version:
r0 = a0 * b0
...
r3 = a3 * b3
  • In SSE2-AVX2 and ALTIVEC this intrinsic results in at least 2-3 instructions.
256-bit version:
The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
  • In SSE2-AVX and ALTIVEC this intrinsic results in at least 4-6 instructions.
  • In AVX2 this intrinsic results in at least 2-3 instructions.
  • In NEON this intrinsic results in at least 2 instructions.
    Note
    Use with mull_hi on the same arguments to save instructions.
int64x2 simdpp::mull_lo ( int32x4  a,
int32x4  b 
)
inline

Multiplies signed 32-bit values in the lower halves of the vectors and expands the results to 64 bits.

128-bit version:
r0 = a0 * b0
r1 = a1 * b1
  • In SSE4.1-AVX this intrinsic results in at least 3 instructions.
  • Not implemented for SSE2-SSSE3 and ALTIVEC.
256-bit version:
The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
  • In SSE4.1-AVX this intrinsic results in at least 6 instructions.
  • In AVX2 this intrinsic results in at least 3 instructions.
  • In NEON this intrinsic results in at least 2 instructions.
  • Not implemented for SSE2-SSSE3 and ALTIVEC.
int64x4 simdpp::mull_lo ( int32x8  a,
int32x8  b 
)
inline

Multiplies signed 32-bit values in the lower halves of the vectors and expands the results to 64 bits.

128-bit version:
r0 = a0 * b0
r1 = a1 * b1
  • In SSE4.1-AVX this intrinsic results in at least 3 instructions.
  • Not implemented for SSE2-SSSE3 and ALTIVEC.
256-bit version:
The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
  • In SSE4.1-AVX this intrinsic results in at least 6 instructions.
  • In AVX2 this intrinsic results in at least 3 instructions.
  • In NEON this intrinsic results in at least 2 instructions.
  • Not implemented for SSE2-SSSE3 and ALTIVEC.
uint64x2 simdpp::mull_lo ( uint32x4  a,
uint32x4  b 
)
inline

Multiplies unsigned 32-bit values in the lower halves of the vectors and expands the results to 64 bits.

128-bit version:
r0 = a0 * b0
r1 = a1 * b1
  • In SSE2-AVX this intrinsic results in at least 3 instructions.
  • Not implemented for ALTIVEC.
256-bit version:
The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
  • In SSE2-AVX this intrinsic results in at least 6 instructions.
  • In AVX2 this intrinsic results in at least 3 instructions.
  • In NEON this intrinsic results in at least 2 instructions.
  • Not implemented for ALTIVEC.
uint64x4 simdpp::mull_lo ( uint32x8  a,
uint32x8  b 
)
inline

Multiplies unsigned 32-bit values in the lower halves of the vectors and expands the results to 64 bits.

128-bit version:
r0 = a0 * b0
r1 = a1 * b1
  • In SSE2-AVX this intrinsic results in at least 3 instructions.
  • Not implemented for ALTIVEC.
256-bit version:
The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
  • In SSE2-AVX this intrinsic results in at least 6 instructions.
  • In AVX2 this intrinsic results in at least 3 instructions.
  • In NEON this intrinsic results in at least 2 instructions.
  • Not implemented for ALTIVEC.
int8x16 simdpp::neg ( int8x16  a)
inline

Negates signed 8-bit values.

r0 = -a0
...
rN = -aN
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
int8x32 simdpp::neg ( int8x32  a)
inline

Negates signed 8-bit values.

r0 = -a0
...
rN = -aN
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
int16x8 simdpp::neg ( int16x8  a)
inline

Negates signed 16-bit values.

r0 = -a0
...
rN = -aN
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
int16x16 simdpp::neg ( int16x16  a)
inline

Negates signed 16-bit values.

r0 = -a0
...
rN = -aN
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
int32x4 simdpp::neg ( int32x4  a)
inline

Negates signed 32-bit values.

r0 = -a0
...
rN = -aN
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
int32x8 simdpp::neg ( int32x8  a)
inline

Negates signed 32-bit values.

r0 = -a0
...
rN = -aN
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
int64x2 simdpp::neg ( int64x2  a)
inline

Negates signed 64-bit values.

r0 = -a0
...
rN = -aN
128-bit version:
  • In ALTIVEC this intrinsic results in at least 4-5 instructions.
256-bit version:
  • In SSE2-AVX and NEON this intrinsic results in at least 2 instructions.
  • In ALTIVEC this intrinsic results in at least 8-9 instructions.
int64x4 simdpp::neg ( int64x4  a)
inline

Negates signed 64-bit values.

r0 = -a0
...
rN = -aN
128-bit version:
  • In ALTIVEC this intrinsic results in at least 4-5 instructions.
256-bit version:
  • In SSE2-AVX and NEON this intrinsic results in at least 2 instructions.
  • In ALTIVEC this intrinsic results in at least 8-9 instructions.
basic_int8x16 simdpp::sub ( basic_int8x16  a,
basic_int8x16  b 
)
inline

Subtracts 8-bit integer values.

r0 = a0 - b0
...
rN = aN - bN
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int8x32 simdpp::sub ( basic_int8x32  a,
basic_int8x32  b 
)
inline

Subtracts 8-bit integer values.

r0 = a0 - b0
...
rN = aN - bN
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int16x8 simdpp::sub ( basic_int16x8  a,
basic_int16x8  b 
)
inline

Subtracts 16-bit integer values.

r0 = a0 - b0
...
rN = aN - bN
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int16x16 simdpp::sub ( basic_int16x16  a,
basic_int16x16  b 
)
inline

Subtracts 16-bit integer values.

r0 = a0 - b0
...
rN = aN - bN
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int32x4 simdpp::sub ( basic_int32x4  a,
basic_int32x4  b 
)
inline

Subtracts 32-bit integer values.

r0 = a0 - b0
...
rN = aN - bN
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int32x8 simdpp::sub ( basic_int32x8  a,
basic_int32x8  b 
)
inline

Subtracts 32-bit integer values.

r0 = a0 - b0
...
rN = aN - bN
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
basic_int64x2 simdpp::sub ( basic_int64x2  a,
basic_int64x2  b 
)
inline

Subtracts 64-bit integer values.

r0 = a0 - b0
...
rN = aN - bN
128-bit version:
  • In ALTIVEC this intrinsic results in at least 5-6 instructions.
256-bit version:
  • In SSE2-AVX and NEON this intrinsic results in at least 2 instructions.
  • In ALTIVEC this intrinsic results in at least 10-11 instructions.
basic_int64x4 simdpp::sub ( basic_int64x4  a,
basic_int64x4  b 
)
inline

Subtracts 64-bit integer values.

r0 = a0 - b0
...
rN = aN - bN
128-bit version:
  • In ALTIVEC this intrinsic results in at least 5-6 instructions.
256-bit version:
  • In SSE2-AVX and NEON this intrinsic results in at least 2 instructions.
  • In ALTIVEC this intrinsic results in at least 10-11 instructions.
int8x16 simdpp::subs ( int8x16  a,
int8x16  b 
)
inline

Subtracts and saturaters signed 8-bit integer values.

r0 = saturated(a0 - b0)
...
rN = saturated(aN - bN)
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
int8x32 simdpp::subs ( int8x32  a,
int8x32  b 
)
inline

Subtracts and saturaters signed 8-bit integer values.

r0 = saturated(a0 - b0)
...
rN = saturated(aN - bN)
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
int16x8 simdpp::subs ( int16x8  a,
int16x8  b 
)
inline

Subtracts and saturaters signed 16-bit integer values.

r0 = saturated(a0 - b0)
...
rN = saturated(aN - bN)
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
int16x16 simdpp::subs ( int16x16  a,
int16x16  b 
)
inline

Subtracts and saturaters signed 16-bit integer values.

r0 = saturated(a0 - b0)
...
rN = saturated(aN - bN)
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
uint8x16 simdpp::subs ( uint8x16  a,
uint8x16  b 
)
inline

Subtracts and saturaters unsigned 8-bit integer values.

r0 = saturated(a0 - b0)
...
rN = saturated(aN - bN)
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
uint8x32 simdpp::subs ( uint8x32  a,
uint8x32  b 
)
inline

Subtracts and saturaters unsigned 8-bit integer values.

r0 = saturated(a0 - b0)
...
rN = saturated(aN - bN)
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
uint16x8 simdpp::subs ( uint16x8  a,
uint16x8  b 
)
inline

Subtracts and saturaters unsigned 16-bit integer values.

r0 = saturated(a0 - b0)
...
rN = saturated(aN - bN)
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
uint16x16 simdpp::subs ( uint16x16  a,
uint16x16  b 
)
inline

Subtracts and saturaters unsigned 16-bit integer values.

r0 = saturated(a0 - b0)
...
rN = saturated(aN - bN)
256-bit version:
  • In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.