libsimdpp  0.9.3
Operations: floating point maths

Functions

float32x4 simdpp::abs (float32x4 a)
 Computes absolute value of floating point values. More...
 
float32x8 simdpp::abs (float32x8 a)
 
mask_float32x4 simdpp::isnan (float32x4 a)
 Checks whether elements in a are IEEE754 NaN. More...
 
mask_float32x8 simdpp::isnan (float32x8 a)
 Checks whether elements in a are IEEE754 NaN. More...
 
float64x2 simdpp::abs (float64x2 a)
 Computes absolute value of floating point values. More...
 
float64x4 simdpp::abs (float64x4 a)
 Computes absolute value of floating point values. More...
 
float32x4 simdpp::sign (float32x4 a)
 Extracts sign bits from the values in float32x4 vector. More...
 
float32x8 simdpp::sign (float32x8 a)
 Extracts sign bits from the values in float32x4 vector. More...
 
float64x2 simdpp::sign (float64x2 a)
 Extracts sigh bit from the values in float64x2 vector. More...
 
float64x4 simdpp::sign (float64x4 a)
 Extracts sigh bit from the values in float64x2 vector. More...
 
float32x4 simdpp::add (float32x4 a, float32x4 b)
 Adds the values of two vectors. More...
 
float32x8 simdpp::add (float32x8 a, float32x8 b)
 Adds the values of two vectors. More...
 
float64x2 simdpp::add (float64x2 a, float64x2 b)
 Adds the values of two vectors. More...
 
float64x4 simdpp::add (float64x4 a, float64x4 b)
 Adds the values of two vectors. More...
 
float32x4 simdpp::sub (float32x4 a, float32x4 b)
 Substracts the values of two vectors. More...
 
float32x8 simdpp::sub (float32x8 a, float32x8 b)
 Substracts the values of two vectors. More...
 
float64x2 simdpp::sub (float64x2 a, float64x2 b)
 Subtracts the values of two vectors. More...
 
float64x4 simdpp::sub (float64x4 a, float64x4 b)
 Subtracts the values of two vectors. More...
 
float32x4 simdpp::neg (float32x4 a)
 Negates the values of a float32x4 vector. More...
 
float32x8 simdpp::neg (float32x8 a)
 Negates the values of a float32x4 vector. More...
 
float64x2 simdpp::neg (float64x2 a)
 Negates the values of a vector. More...
 
float64x4 simdpp::neg (float64x4 a)
 Negates the values of a vector. More...
 
float32x4 simdpp::mul (float32x4 a, float32x4 b)
 Multiplies the values of two vectors. More...
 
float32x8 simdpp::mul (float32x8 a, float32x8 b)
 Multiplies the values of two vectors. More...
 
float64x2 simdpp::mul (float64x2 a, float64x2 b)
 Multiplies the values of two vectors. More...
 
float64x4 simdpp::mul (float64x4 a, float64x4 b)
 Multiplies the values of two vectors. More...
 
float32x4 simdpp::fmadd (float32x4 a, float32x4 b, float32x4 c)
 Performs a fused multiply-add operation. More...
 
float32x8 simdpp::fmadd (float32x8 a, float32x8 b, float32x8 c)
 Performs a fused multiply-add operation. More...
 
float64x2 simdpp::fmadd (float64x2 a, float64x2 b, float64x2 c)
 Performs a fused multiply-add operation. More...
 
float64x4 simdpp::fmadd (float64x4 a, float64x4 b, float64x4 c)
 Performs a fused multiply-add operation. More...
 
float32x4 simdpp::fmsub (float32x4 a, float32x4 b, float32x4 c)
 Performs a fused multiply-sutract operation. More...
 
float32x8 simdpp::fmsub (float32x8 a, float32x8 b, float32x8 c)
 Performs a fused multiply-sutract operation. More...
 
float64x2 simdpp::fmsub (float64x2 a, float64x2 b, float64x2 c)
 Performs a fused multiply-sutract operation. More...
 
float64x4 simdpp::fmsub (float64x4 a, float64x4 b, float64x4 c)
 Performs a fused multiply-sutract operation. More...
 

Detailed Description

Function Documentation

float32x4 simdpp::abs ( float32x4  a)
inline

Computes absolute value of floating point values.

r0 = abs(a0)
...
rN = abs(aN)
128-bit version:
  • In SSE2-AVX2 this intrinsic results in at least 1-2 instructions.
  • In ALTIVEC this intrinsic results in at least 1-2 instructions.
256-bit version:
  • In SSE2-SSE4.1 this intrinsic results in at least 2-3 instructions.
  • In NEON this intrinsic results in at least 2 instructions.
  • In AVX-AVX2 this intrinsic results in at least 1-2 instructions.
  • In ALTIVEC this intrinsic results in at least 2-3 instructions.
float32x8 simdpp::abs ( float32x8  a)
inline
float64x2 simdpp::abs ( float64x2  a)
inline

Computes absolute value of floating point values.

r0 = abs(a0)
...
rN = abs(aN)
128-bit version:
  • Not vectorized in NEON and .
  • In SSE2-AVX2 this intrinsic results in at least 1-2 instructions.
256-bit version:
  • Not vectorized in NEON and .
  • In SSE2-SSE4.1 this intrinsic results in at least 2-3 instructions.
  • In AVX-AVX2 this intrinsic results in at least 1-2 instructions.
float64x4 simdpp::abs ( float64x4  a)
inline

Computes absolute value of floating point values.

r0 = abs(a0)
...
rN = abs(aN)
128-bit version:
  • Not vectorized in NEON and .
  • In SSE2-AVX2 this intrinsic results in at least 1-2 instructions.
256-bit version:
  • Not vectorized in NEON and .
  • In SSE2-SSE4.1 this intrinsic results in at least 2-3 instructions.
  • In AVX-AVX2 this intrinsic results in at least 1-2 instructions.
float32x4 simdpp::add ( float32x4  a,
float32x4  b 
)
inline

Adds the values of two vectors.

r0 = a0 + b0
...
rN = aN + bN
256-bit version:
  • In SSE2-SSE4.1, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
float32x8 simdpp::add ( float32x8  a,
float32x8  b 
)
inline

Adds the values of two vectors.

r0 = a0 + b0
...
rN = aN + bN
256-bit version:
  • In SSE2-SSE4.1, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
float64x2 simdpp::add ( float64x2  a,
float64x2  b 
)
inline

Adds the values of two vectors.

r0 = a0 + b0
...
rN = aN + bN
128-bit version:
  • Not vectorized in NEON and .
256-bit version:
  • Not vectorized in NEON and .
  • In SSE2-SSE4.1 this intrinsic results in at least 2 instructions.
float64x4 simdpp::add ( float64x4  a,
float64x4  b 
)
inline

Adds the values of two vectors.

r0 = a0 + b0
...
rN = aN + bN
128-bit version:
  • Not vectorized in NEON and .
256-bit version:
  • Not vectorized in NEON and .
  • In SSE2-SSE4.1 this intrinsic results in at least 2 instructions.
float32x4 simdpp::fmadd ( float32x4  a,
float32x4  b,
float32x4  c 
)
inline

Performs a fused multiply-add operation.

r0 = a0 * b0 + c0
...
rN = aN * bN + cN

Implemented only on architectures with either X86_FMA3 or X86_FMA4 support.

float32x8 simdpp::fmadd ( float32x8  a,
float32x8  b,
float32x8  c 
)
inline

Performs a fused multiply-add operation.

r0 = a0 * b0 + c0
...
rN = aN * bN + cN

Implemented only on architectures with either X86_FMA3 or X86_FMA4 support.

float64x2 simdpp::fmadd ( float64x2  a,
float64x2  b,
float64x2  c 
)
inline

Performs a fused multiply-add operation.

r0 = a0 * b0 + c0
...
rN = aN * bN + cN

Implemented only on architectures with either X86_FMA3 or X86_FMA4 support.

float64x4 simdpp::fmadd ( float64x4  a,
float64x4  b,
float64x4  c 
)
inline

Performs a fused multiply-add operation.

r0 = a0 * b0 + c0
...
rN = aN * bN + cN

Implemented only on architectures with either X86_FMA3 or X86_FMA4 support.

float32x4 simdpp::fmsub ( float32x4  a,
float32x4  b,
float32x4  c 
)
inline

Performs a fused multiply-sutract operation.

r0 = a0 * b0 - c0
...
rN = aN * bN - cN

Implemented only on architectures with either X86_FMA3 or X86_FMA4 support.

float32x8 simdpp::fmsub ( float32x8  a,
float32x8  b,
float32x8  c 
)
inline

Performs a fused multiply-sutract operation.

r0 = a0 * b0 - c0
...
rN = aN * bN - cN

Implemented only on architectures with either X86_FMA3 or X86_FMA4 support.

float64x2 simdpp::fmsub ( float64x2  a,
float64x2  b,
float64x2  c 
)
inline

Performs a fused multiply-sutract operation.

r0 = a0 * b0 - c0
...
rN = aN * bN - cN

Implemented only on architectures with either X86_FMA3 or X86_FMA4 support.

float64x4 simdpp::fmsub ( float64x4  a,
float64x4  b,
float64x4  c 
)
inline

Performs a fused multiply-sutract operation.

r0 = a0 * b0 - c0
...
rN = aN * bN - cN

Implemented only on architectures with either X86_FMA3 or X86_FMA4 support.

mask_float32x4 simdpp::isnan ( float32x4  a)
inline

Checks whether elements in a are IEEE754 NaN.

r0 = isnan(a0) ? 0xffffffff : 0
...
rN = isnan(aN) ? 0xffffffff : 0
256-bit version:
  • In SSE2-SSE4.1, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
mask_float32x8 simdpp::isnan ( float32x8  a)
inline

Checks whether elements in a are IEEE754 NaN.

r0 = isnan(a0) ? 0xffffffff : 0
...
rN = isnan(aN) ? 0xffffffff : 0
256-bit version:
  • In SSE2-SSE4.1, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
float32x4 simdpp::mul ( float32x4  a,
float32x4  b 
)
inline

Multiplies the values of two vectors.

r0 = a0 * b0
...
rN = aN * bN
256-bit version:
  • In SSE2-SSE4.1, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
float32x8 simdpp::mul ( float32x8  a,
float32x8  b 
)
inline

Multiplies the values of two vectors.

r0 = a0 * b0
...
rN = aN * bN
256-bit version:
  • In SSE2-SSE4.1, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
float64x2 simdpp::mul ( float64x2  a,
float64x2  b 
)
inline

Multiplies the values of two vectors.

r0 = a0 * b0
...
rN = aN * bN
128-bit version:
  • Not vectorized in NEON and .
256-bit version:
  • Not vectorized in NEON and .
  • In SSE2-SSE4.1 this intrinsic results in at least 2 instructions.
float64x4 simdpp::mul ( float64x4  a,
float64x4  b 
)
inline

Multiplies the values of two vectors.

r0 = a0 * b0
...
rN = aN * bN
128-bit version:
  • Not vectorized in NEON and .
256-bit version:
  • Not vectorized in NEON and .
  • In SSE2-SSE4.1 this intrinsic results in at least 2 instructions.
float32x4 simdpp::neg ( float32x4  a)
inline

Negates the values of a float32x4 vector.

r0 = -a0
...
rN = -aN
128-bit version:
  • In SSE2-AVX2 and ALTIVEC this intrinsic results in at least 1-2 instructions.
256-bit version:
  • In SSE2-SSE4.1 and ALTIVEC this intrinsic results in at least 2-3 instructions.
  • In AVX-AVX2 and NEON this intrinsic results in at least 2 instructions.
float32x8 simdpp::neg ( float32x8  a)
inline

Negates the values of a float32x4 vector.

r0 = -a0
...
rN = -aN
128-bit version:
  • In SSE2-AVX2 and ALTIVEC this intrinsic results in at least 1-2 instructions.
256-bit version:
  • In SSE2-SSE4.1 and ALTIVEC this intrinsic results in at least 2-3 instructions.
  • In AVX-AVX2 and NEON this intrinsic results in at least 2 instructions.
float64x2 simdpp::neg ( float64x2  a)
inline

Negates the values of a vector.

r0 = -a0
...
rN = -aN
128-bit version:
  • In SSE2-AVX2 this intrinsic results in at least 1-2 instructions.
  • Not vectorized in NEON and .
256-bit version:
  • In SSE2-SSE4.1 this intrinsic results in at least 2-3 instructions.
  • In AVX-AVX2 this intrinsic results in at least 1-2 instructions.
  • Not vectorized in NEON and .
float64x4 simdpp::neg ( float64x4  a)
inline

Negates the values of a vector.

r0 = -a0
...
rN = -aN
128-bit version:
  • In SSE2-AVX2 this intrinsic results in at least 1-2 instructions.
  • Not vectorized in NEON and .
256-bit version:
  • In SSE2-SSE4.1 this intrinsic results in at least 2-3 instructions.
  • In AVX-AVX2 this intrinsic results in at least 1-2 instructions.
  • Not vectorized in NEON and .
float32x4 simdpp::sign ( float32x4  a)
inline

Extracts sign bits from the values in float32x4 vector.

r0 = a0 & 0x80000000
...
rN = aN & 0x80000000
128-bit version:
  • In SSE2-SSE4.1, ALTIVEC and NEON this intrinsic results in at least 1-2 instructions.
256-bit version:
  • In SSE2-SSE4.1, ALTIVEC and NEON this intrinsic results in at least 2-3 instructions.
  • In AVX-AVX2 this intrinsic results in at least 1-2 instructions.
float32x8 simdpp::sign ( float32x8  a)
inline

Extracts sign bits from the values in float32x4 vector.

r0 = a0 & 0x80000000
...
rN = aN & 0x80000000
128-bit version:
  • In SSE2-SSE4.1, ALTIVEC and NEON this intrinsic results in at least 1-2 instructions.
256-bit version:
  • In SSE2-SSE4.1, ALTIVEC and NEON this intrinsic results in at least 2-3 instructions.
  • In AVX-AVX2 this intrinsic results in at least 1-2 instructions.
float64x2 simdpp::sign ( float64x2  a)
inline

Extracts sigh bit from the values in float64x2 vector.

r0 = a0 & 0x8000000000000000
...
r0 = aN & 0x8000000000000000
128-bit version:
  • In SSE2-AVX2 this intrinsic results in at least 1-2 instructions.
  • Not vectorized in NEON and .
256-bit version:
  • In SSE2-SSE4.1 this intrinsic results in at least 2-3 instructions.
  • In AVX-AVX2 this intrinsic results in at least 1-2 instructions.
  • Not vectorized in NEON and .
float64x4 simdpp::sign ( float64x4  a)
inline

Extracts sigh bit from the values in float64x2 vector.

r0 = a0 & 0x8000000000000000
...
r0 = aN & 0x8000000000000000
128-bit version:
  • In SSE2-AVX2 this intrinsic results in at least 1-2 instructions.
  • Not vectorized in NEON and .
256-bit version:
  • In SSE2-SSE4.1 this intrinsic results in at least 2-3 instructions.
  • In AVX-AVX2 this intrinsic results in at least 1-2 instructions.
  • Not vectorized in NEON and .
float32x4 simdpp::sub ( float32x4  a,
float32x4  b 
)
inline

Substracts the values of two vectors.

r0 = a0 - b0
...
rN = aN - bN
256-bit version:
  • In SSE2-SSE4.1, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
float32x8 simdpp::sub ( float32x8  a,
float32x8  b 
)
inline

Substracts the values of two vectors.

r0 = a0 - b0
...
rN = aN - bN
256-bit version:
  • In SSE2-SSE4.1, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
float64x2 simdpp::sub ( float64x2  a,
float64x2  b 
)
inline

Subtracts the values of two vectors.

r0 = a0 - b0
...
rN = aN - bN
128-bit version:
  • Not vectorized in NEON and .
256-bit version:
  • Not vectorized in NEON and .
  • In SSE2-SSE4.1 this intrinsic results in at least 2 instructions.
float64x4 simdpp::sub ( float64x4  a,
float64x4  b 
)
inline

Subtracts the values of two vectors.

r0 = a0 - b0
...
rN = aN - bN
128-bit version:
  • Not vectorized in NEON and .
256-bit version:
  • Not vectorized in NEON and .
  • In SSE2-SSE4.1 this intrinsic results in at least 2 instructions.