Functions
basic_int8x16	simdpp::add (basic_int8x16 a, basic_int8x16 b)
	Adds 8-bit integer values. More...

basic_int8x32	simdpp::add (basic_int8x32 a, basic_int8x32 b)

int8x16	simdpp::min (int8x16 a, int8x16 b)
	Computes minimum of signed 8-bit values. More...

int8x32	simdpp::min (int8x32 a, int8x32 b)
	Computes minimum of signed 8-bit values. More...

basic_int16x8	simdpp::add (basic_int16x8 a, basic_int16x8 b)
	Adds 16-bit integer values. More...

basic_int16x16	simdpp::add (basic_int16x16 a, basic_int16x16 b)
	Adds 16-bit integer values. More...

basic_int32x4	simdpp::add (basic_int32x4 a, basic_int32x4 b)
	Adds 32-bit integer values. More...

basic_int32x8	simdpp::add (basic_int32x8 a, basic_int32x8 b)
	Adds 32-bit integer values. More...

basic_int64x2	simdpp::add (basic_int64x2 a, basic_int64x2 b)
	Adds 64-bit integer values. More...

basic_int64x4	simdpp::add (basic_int64x4 a, basic_int64x4 b)
	Adds 64-bit integer values. More...

int8x16	simdpp::adds (int8x16 a, int8x16 b)
	Adds and saturates signed 8-bit integer values. More...

int8x32	simdpp::adds (int8x32 a, int8x32 b)
	Adds and saturates signed 8-bit integer values. More...

int16x8	simdpp::adds (int16x8 a, int16x8 b)
	Adds and saturates signed 16-bit integer values. More...

int16x16	simdpp::adds (int16x16 a, int16x16 b)
	Adds and saturates signed 16-bit integer values. More...

uint8x16	simdpp::adds (uint8x16 a, uint8x16 b)
	Adds and saturates unsigned 8-bit integer values. More...

uint8x32	simdpp::adds (uint8x32 a, uint8x32 b)
	Adds and saturates unsigned 8-bit integer values. More...

uint16x8	simdpp::adds (uint16x8 a, uint16x8 b)
	Adds and saturates unsigned 16-bit integer values. More...

uint16x16	simdpp::adds (uint16x16 a, uint16x16 b)
	Adds and saturates unsigned 16-bit integer values. More...

basic_int8x16	simdpp::sub (basic_int8x16 a, basic_int8x16 b)
	Subtracts 8-bit integer values. More...

basic_int8x32	simdpp::sub (basic_int8x32 a, basic_int8x32 b)
	Subtracts 8-bit integer values. More...

basic_int16x8	simdpp::sub (basic_int16x8 a, basic_int16x8 b)
	Subtracts 16-bit integer values. More...

basic_int16x16	simdpp::sub (basic_int16x16 a, basic_int16x16 b)
	Subtracts 16-bit integer values. More...

basic_int32x4	simdpp::sub (basic_int32x4 a, basic_int32x4 b)
	Subtracts 32-bit integer values. More...

basic_int32x8	simdpp::sub (basic_int32x8 a, basic_int32x8 b)
	Subtracts 32-bit integer values. More...

basic_int64x2	simdpp::sub (basic_int64x2 a, basic_int64x2 b)
	Subtracts 64-bit integer values. More...

basic_int64x4	simdpp::sub (basic_int64x4 a, basic_int64x4 b)
	Subtracts 64-bit integer values. More...

int8x16	simdpp::subs (int8x16 a, int8x16 b)
	Subtracts and saturaters signed 8-bit integer values. More...

int8x32	simdpp::subs (int8x32 a, int8x32 b)
	Subtracts and saturaters signed 8-bit integer values. More...

int16x8	simdpp::subs (int16x8 a, int16x8 b)
	Subtracts and saturaters signed 16-bit integer values. More...

int16x16	simdpp::subs (int16x16 a, int16x16 b)
	Subtracts and saturaters signed 16-bit integer values. More...

uint8x16	simdpp::subs (uint8x16 a, uint8x16 b)
	Subtracts and saturaters unsigned 8-bit integer values. More...

uint8x32	simdpp::subs (uint8x32 a, uint8x32 b)
	Subtracts and saturaters unsigned 8-bit integer values. More...

uint16x8	simdpp::subs (uint16x8 a, uint16x8 b)
	Subtracts and saturaters unsigned 16-bit integer values. More...

uint16x16	simdpp::subs (uint16x16 a, uint16x16 b)
	Subtracts and saturaters unsigned 16-bit integer values. More...

int8x16	simdpp::neg (int8x16 a)
	Negates signed 8-bit values. More...

int8x32	simdpp::neg (int8x32 a)
	Negates signed 8-bit values. More...

int16x8	simdpp::neg (int16x8 a)
	Negates signed 16-bit values. More...

int16x16	simdpp::neg (int16x16 a)
	Negates signed 16-bit values. More...

int32x4	simdpp::neg (int32x4 a)
	Negates signed 32-bit values. More...

int32x8	simdpp::neg (int32x8 a)
	Negates signed 32-bit values. More...

int64x2	simdpp::neg (int64x2 a)
	Negates signed 64-bit values. More...

int64x4	simdpp::neg (int64x4 a)
	Negates signed 64-bit values. More...

basic_int16x8	simdpp::mul_lo (basic_int16x8 a, basic_int16x8 b)
	Multiplies 16-bit values and returns the lower part of the multiplication. More...

basic_int16x16	simdpp::mul_lo (basic_int16x16 a, basic_int16x16 b)
	Multiplies 16-bit values and returns the lower part of the multiplication. More...

int16x8	simdpp::mul_hi (int16x8 a, int16x8 b)
	Multiplies signed 16-bit values and returns the higher half of the result. More...

int16x16	simdpp::mul_hi (int16x16 a, int16x16 b)
	Multiplies signed 16-bit values and returns the higher half of the result. More...

uint16x8	simdpp::mul_hi (uint16x8 a, uint16x8 b)
	Multiplies unsigned 16-bit values and returns the higher half of the result. More...

uint16x16	simdpp::mul_hi (uint16x16 a, uint16x16 b)
	Multiplies unsigned 16-bit values and returns the higher half of the result. More...

int128	simdpp::mul_lo (basic_int32x4 a, basic_int32x4 b)
	Multiplies 32-bit values and returns the lower half of the result. More...

basic_int32x8	simdpp::mul_lo (basic_int32x8 a, basic_int32x8 b)
	Multiplies 32-bit values and returns the lower half of the result. More...

int32x4	simdpp::mull_lo (int16x8 a, int16x8 b)
	Multiplies signed 16-bit values in the lower halves of the vectors and expands the results to 32 bits. More...

int32x8	simdpp::mull_lo (int16x16 a, int16x16 b)
	Multiplies signed 16-bit values in the lower halves of the vectors and expands the results to 32 bits. More...

uint32x4	simdpp::mull_lo (uint16x8 a, uint16x8 b)
	Multiplies unsigned 16-bit values in the lower halves of the vectors and expands the results to 32 bits. More...

uint32x8	simdpp::mull_lo (uint16x16 a, uint16x16 b)
	Multiplies unsigned 16-bit values in the lower halves of the vectors and expands the results to 32 bits. More...

int32x4	simdpp::mull_hi (int16x8 a, int16x8 b)
	Multiplies signed 16-bit values in the higher halves of the vectors and expands the results to 32 bits. More...

int32x8	simdpp::mull_hi (int16x16 a, int16x16 b)
	Multiplies signed 16-bit values in the higher halves of the vectors and expands the results to 32 bits. More...

uint32x4	simdpp::mull_hi (uint16x8 a, uint16x8 b)
	Multiplies unsigned 16-bit values in the higher halves of the vectors and expands the results to 32 bits. More...

uint32x8	simdpp::mull_hi (uint16x16 a, uint16x16 b)
	Multiplies unsigned 16-bit values in the higher halves of the vectors and expands the results to 32 bits. More...

int64x2	simdpp::mull_lo (int32x4 a, int32x4 b)
	Multiplies signed 32-bit values in the lower halves of the vectors and expands the results to 64 bits. More...

int64x4	simdpp::mull_lo (int32x8 a, int32x8 b)
	Multiplies signed 32-bit values in the lower halves of the vectors and expands the results to 64 bits. More...

uint64x2	simdpp::mull_lo (uint32x4 a, uint32x4 b)
	Multiplies unsigned 32-bit values in the lower halves of the vectors and expands the results to 64 bits. More...

uint64x4	simdpp::mull_lo (uint32x8 a, uint32x8 b)
	Multiplies unsigned 32-bit values in the lower halves of the vectors and expands the results to 64 bits. More...

int64x2	simdpp::mull_hi (int32x4 a, int32x4 b)
	Multiplies signed 32-bit values in the higher halves of the vectors and expands the results to 64 bits. More...

int64x4	simdpp::mull_hi (int32x8 a, int32x8 b)
	Multiplies signed 32-bit values in the higher halves of the vectors and expands the results to 64 bits. More...

uint64x2	simdpp::mull_hi (uint32x4 a, uint32x4 b)
	Multiplies unsigned 32-bit values in the higher halves of the vectors and expands the results to 64 bits. More...

uint64x4	simdpp::mull_hi (uint32x8 a, uint32x8 b)
	Multiplies unsigned 32-bit values in the higher halves of the vectors and expands the results to 64 bits. More...

Detailed Description

Function Documentation

basic_int8x16 simdpp::add	(	basic_int8x16	a,
		basic_int8x16	b
	)

inline

Adds 8-bit integer values.

r0 = a0 + b0
...
rN = aN + bN

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

basic_int8x32 simdpp::add	(	basic_int8x32	a,
		basic_int8x32	b
	)

inline

basic_int16x8 simdpp::add	(	basic_int16x8	a,
		basic_int16x8	b
	)

inline

Adds 16-bit integer values.

r0 = a0 + b0
...
rN = aN + bN

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

basic_int16x16 simdpp::add	(	basic_int16x16	a,
		basic_int16x16	b
	)

inline

Adds 16-bit integer values.

r0 = a0 + b0
...
rN = aN + bN

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

basic_int32x4 simdpp::add	(	basic_int32x4	a,
		basic_int32x4	b
	)

inline

Adds 32-bit integer values.

r0 = a0 + b0
...
rN = aN + bN

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

basic_int32x8 simdpp::add	(	basic_int32x8	a,
		basic_int32x8	b
	)

inline

Adds 32-bit integer values.

r0 = a0 + b0
...
rN = aN + bN

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

basic_int64x2 simdpp::add	(	basic_int64x2	a,
		basic_int64x2	b
	)

inline

Adds 64-bit integer values.

r0 = a0 + b0
...
rN = aN + bN

128-bit version:

In ALTIVEC this intrinsic results in at least 5-6 instructions.

256-bit version:

In SSE2-AVX and NEON this intrinsic results in at least 2 instructions.
In ALTIVEC this intrinsic results in at least 10-11 instructions.

basic_int64x4 simdpp::add	(	basic_int64x4	a,
		basic_int64x4	b
	)

inline

Adds 64-bit integer values.

r0 = a0 + b0
...
rN = aN + bN

128-bit version:

In ALTIVEC this intrinsic results in at least 5-6 instructions.

256-bit version:

In SSE2-AVX and NEON this intrinsic results in at least 2 instructions.
In ALTIVEC this intrinsic results in at least 10-11 instructions.

int8x16 simdpp::adds	(	int8x16	a,
		int8x16	b
	)

inline

Adds and saturates signed 8-bit integer values.

r0 = signed_saturate(a0 + b0)
...
rN = signed_saturate(aN + bN)

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

int8x32 simdpp::adds	(	int8x32	a,
		int8x32	b
	)

inline

Adds and saturates signed 8-bit integer values.

r0 = signed_saturate(a0 + b0)
...
rN = signed_saturate(aN + bN)

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

int16x8 simdpp::adds	(	int16x8	a,
		int16x8	b
	)

inline

Adds and saturates signed 16-bit integer values.

r0 = signed_saturate(a0 + b0)
...
rN = signed_saturate(aN + bN)

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

int16x16 simdpp::adds	(	int16x16	a,
		int16x16	b
	)

inline

Adds and saturates signed 16-bit integer values.

r0 = signed_saturate(a0 + b0)
...
rN = signed_saturate(aN + bN)

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

uint8x16 simdpp::adds	(	uint8x16	a,
		uint8x16	b
	)

inline

Adds and saturates unsigned 8-bit integer values.

r0 = unsigned_saturate(a0 + b0)
...
rN = unsigned_saturate(aN + bN)

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

uint8x32 simdpp::adds	(	uint8x32	a,
		uint8x32	b
	)

inline

Adds and saturates unsigned 8-bit integer values.

r0 = unsigned_saturate(a0 + b0)
...
rN = unsigned_saturate(aN + bN)

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

uint16x8 simdpp::adds	(	uint16x8	a,
		uint16x8	b
	)

inline

Adds and saturates unsigned 16-bit integer values.

r0 = unsigned_saturate(a0 + b0)
...
rN = unsigned_saturate(aN + bN)

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

uint16x16 simdpp::adds	(	uint16x16	a,
		uint16x16	b
	)

inline

Adds and saturates unsigned 16-bit integer values.

r0 = unsigned_saturate(a0 + b0)
...
rN = unsigned_saturate(aN + bN)

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

int8x16 simdpp::min	(	int8x16	a,
		int8x16	b
	)

inline

Computes minimum of signed 8-bit values.

r0 = min(a0, b0)
...
rN = min(aN, bN)

128-bit version:

In SSE2-SSSE3 this intrinsic results in at least 4 instructions.

256-bit version:

In SSE2-SSSE3 this intrinsic results in at least 8 instructions.
In SSE4.1-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

int8x32 simdpp::min	(	int8x32	a,
		int8x32	b
	)

inline

Computes minimum of signed 8-bit values.

r0 = min(a0, b0)
...
rN = min(aN, bN)

128-bit version:

In SSE2-SSSE3 this intrinsic results in at least 4 instructions.

256-bit version:

In SSE2-SSSE3 this intrinsic results in at least 8 instructions.
In SSE4.1-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

int16x8 simdpp::mul_hi	(	int16x8	a,
		int16x8	b
	)

inline

Multiplies signed 16-bit values and returns the higher half of the result.

r0 = high(a0 * b0)
...
rN = high(aN * bN)

128-bit version:

In NEON and ALTIVEC this intrinsic results in at least 3 instructions.

256-bit version:

In SSE2-AVX this intrinsic results in at least 2 instructions.
In NEON and ALTIVEC this intrinsic results in at least 6 instructions.

int16x16 simdpp::mul_hi	(	int16x16	a,
		int16x16	b
	)

inline

Multiplies signed 16-bit values and returns the higher half of the result.

r0 = high(a0 * b0)
...
rN = high(aN * bN)

128-bit version:

In NEON and ALTIVEC this intrinsic results in at least 3 instructions.

256-bit version:

In SSE2-AVX this intrinsic results in at least 2 instructions.
In NEON and ALTIVEC this intrinsic results in at least 6 instructions.

uint16x8 simdpp::mul_hi	(	uint16x8	a,
		uint16x8	b
	)

inline

Multiplies unsigned 16-bit values and returns the higher half of the result.

r0 = high(a0 * b0)
...
rN = high(aN * bN)

128-bit version:

In NEON and ALTIVEC this intrinsic results in at least 3 instructions.

256-bit version:

In SSE2-AVX this intrinsic results in at least 2 instructions.
In NEON and ALTIVEC this intrinsic results in at least 6 instructions.

uint16x16 simdpp::mul_hi	(	uint16x16	a,
		uint16x16	b
	)

inline

Multiplies unsigned 16-bit values and returns the higher half of the result.

r0 = high(a0 * b0)
...
rN = high(aN * bN)

128-bit version:

In NEON and ALTIVEC this intrinsic results in at least 3 instructions.

256-bit version:

In SSE2-AVX this intrinsic results in at least 2 instructions.
In NEON and ALTIVEC this intrinsic results in at least 6 instructions.

basic_int16x8 simdpp::mul_lo	(	basic_int16x8	a,
		basic_int16x8	b
	)

inline

Multiplies 16-bit values and returns the lower part of the multiplication.

r0 = low(a0 * b0)
...
rN = low(aN * bN)

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

basic_int16x16 simdpp::mul_lo	(	basic_int16x16	a,
		basic_int16x16	b
	)

inline

Multiplies 16-bit values and returns the lower part of the multiplication.

r0 = low(a0 * b0)
...
rN = low(aN * bN)

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

int128 simdpp::mul_lo	(	basic_int32x4	a,
		basic_int32x4	b
	)

inline

Multiplies 32-bit values and returns the lower half of the result.

r0 = low(a0 * b0)
...
rN = low(aN * bN)

128-bit version:

In SSE2-SSSE3 this intrinsic results in at least 6 instructions.
In ALTIVEC this intrinsic results in at least 8 instructions.

256-bit version:

In SSE2-SSSE3 this intrinsic results in at least 12 instructions.
In SSE4.1, AVX and NEON this intrinsic results in at least 2 instructions.
In ALTIVEC this intrinsic results in at least 16 instructions.

basic_int32x8 simdpp::mul_lo	(	basic_int32x8	a,
		basic_int32x8	b
	)

inline

Multiplies 32-bit values and returns the lower half of the result.

r0 = low(a0 * b0)
...
rN = low(aN * bN)

128-bit version:

In SSE2-SSSE3 this intrinsic results in at least 6 instructions.
In ALTIVEC this intrinsic results in at least 8 instructions.

256-bit version:

In SSE2-SSSE3 this intrinsic results in at least 12 instructions.
In SSE4.1, AVX and NEON this intrinsic results in at least 2 instructions.
In ALTIVEC this intrinsic results in at least 16 instructions.

int32x4 simdpp::mull_hi	(	int16x8	a,
		int16x8	b
	)

inline

Multiplies signed 16-bit values in the higher halves of the vectors and expands the results to 32 bits.

128-bit version:: r0 = a4 * b4

...

r3 = a7 * b7

In SSE2-AVX2 and ALTIVEC this intrinsic results in at least 2-3 instructions.

256-bit version:: The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.

In SSE2-AVX and ALTIVEC this intrinsic results in at least 4-6 instructions.
In AVX2 this intrinsic results in at least 2-3 instructions.
In NEON this intrinsic results in at least 2 instructions.
Note
Use with mull_lo on the same arguments to save instructions.

int32x8 simdpp::mull_hi	(	int16x16	a,
		int16x16	b
	)

inline

Multiplies signed 16-bit values in the higher halves of the vectors and expands the results to 32 bits.

128-bit version:: r0 = a4 * b4

...

r3 = a7 * b7

In SSE2-AVX2 and ALTIVEC this intrinsic results in at least 2-3 instructions.

256-bit version:: The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.

In SSE2-AVX and ALTIVEC this intrinsic results in at least 4-6 instructions.
In AVX2 this intrinsic results in at least 2-3 instructions.
In NEON this intrinsic results in at least 2 instructions.
Note
Use with mull_lo on the same arguments to save instructions.

uint32x4 simdpp::mull_hi	(	uint16x8	a,
		uint16x8	b
	)

inline

Multiplies unsigned 16-bit values in the higher halves of the vectors and expands the results to 32 bits.

128-bit version:: r0 = a4 * b4

...

r3 = a7 * b7

In SSE2-AVX2 and ALTIVEC this intrinsic results in at least 2-3 instructions.

256-bit version:: The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.

In SSE2-AVX and ALTIVEC this intrinsic results in at least 4-6 instructions.
In AVX2 this intrinsic results in at least 2-3 instructions.
In NEON this intrinsic results in at least 2 instructions.
Note
Use with mull_lo on the same arguments to save instructions.

uint32x8 simdpp::mull_hi	(	uint16x16	a,
		uint16x16	b
	)

inline

Multiplies unsigned 16-bit values in the higher halves of the vectors and expands the results to 32 bits.

128-bit version:: r0 = a4 * b4

...

r3 = a7 * b7

In SSE2-AVX2 and ALTIVEC this intrinsic results in at least 2-3 instructions.

256-bit version:: The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.

In SSE2-AVX and ALTIVEC this intrinsic results in at least 4-6 instructions.
In AVX2 this intrinsic results in at least 2-3 instructions.
In NEON this intrinsic results in at least 2 instructions.
Note
Use with mull_lo on the same arguments to save instructions.

int64x2 simdpp::mull_hi	(	int32x4	a,
		int32x4	b
	)

inline

Multiplies signed 32-bit values in the higher halves of the vectors and expands the results to 64 bits.

128-bit version:

r0 = a2 * b2

r1 = a3 * b3

In SSE4.1-AVX2 this intrinsic results in at least 3 instructions.
Not implemented for SSE2-SSSE3 and ALTIVEC.

256-bit version:: The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.

In SSE4.1-AVX this intrinsic results in at least 6 instructions.
In AVX2 this intrinsic results in at least 3 instructions.
In NEON this intrinsic results in at least 2 instructions.
Not implemented for SSE2-SSSE3 and ALTIVEC.

int64x4 simdpp::mull_hi	(	int32x8	a,
		int32x8	b
	)

inline

Multiplies signed 32-bit values in the higher halves of the vectors and expands the results to 64 bits.

128-bit version:

r0 = a2 * b2

r1 = a3 * b3

In SSE4.1-AVX2 this intrinsic results in at least 3 instructions.
Not implemented for SSE2-SSSE3 and ALTIVEC.

256-bit version:: The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.

In SSE4.1-AVX this intrinsic results in at least 6 instructions.
In AVX2 this intrinsic results in at least 3 instructions.
In NEON this intrinsic results in at least 2 instructions.
Not implemented for SSE2-SSSE3 and ALTIVEC.

uint64x2 simdpp::mull_hi	(	uint32x4	a,
		uint32x4	b
	)

inline

Multiplies unsigned 32-bit values in the higher halves of the vectors and expands the results to 64 bits.

128-bit version:

r0 = a2 * b2

r1 = a3 * b3

In SSE2-AVX this intrinsic results in at least 3 instructions.
Not vectorized in ALTIVEC.

256-bit version:: The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.

In SSE2-AVX this intrinsic results in at least 6 instructions.
In AVX2 this intrinsic results in at least 3 instructions.
In NEON this intrinsic results in at least 2 instructions.
Not vectorized in ALTIVEC.

uint64x4 simdpp::mull_hi	(	uint32x8	a,
		uint32x8	b
	)

inline

Multiplies unsigned 32-bit values in the higher halves of the vectors and expands the results to 64 bits.

128-bit version:

r0 = a2 * b2

r1 = a3 * b3

In SSE2-AVX this intrinsic results in at least 3 instructions.
Not vectorized in ALTIVEC.

256-bit version:: The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.

In SSE2-AVX this intrinsic results in at least 6 instructions.
In AVX2 this intrinsic results in at least 3 instructions.
In NEON this intrinsic results in at least 2 instructions.
Not vectorized in ALTIVEC.

int32x4 simdpp::mull_lo	(	int16x8	a,
		int16x8	b
	)

inline

Multiplies signed 16-bit values in the lower halves of the vectors and expands the results to 32 bits.

128-bit version:: r0 = a0 * b0

...

r3 = a3 * b3

In SSE2-AVX and ALTIVEC this intrinsic results in at least 2-3 instructions.

256-bit version:

The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.

In SSE2-AVX and ALTIVEC this intrinsic results in at least 4-6 instructions.
In AVX2 and NEON this intrinsic results in at least 2-3 instructions.
Note
Use with mull_hi on the same arguments to save instructions.

int32x8 simdpp::mull_lo	(	int16x16	a,
		int16x16	b
	)

inline

Multiplies signed 16-bit values in the lower halves of the vectors and expands the results to 32 bits.

128-bit version:: r0 = a0 * b0

...

r3 = a3 * b3

In SSE2-AVX and ALTIVEC this intrinsic results in at least 2-3 instructions.

256-bit version:

The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.

In SSE2-AVX and ALTIVEC this intrinsic results in at least 4-6 instructions.
In AVX2 and NEON this intrinsic results in at least 2-3 instructions.
Note
Use with mull_hi on the same arguments to save instructions.

uint32x4 simdpp::mull_lo	(	uint16x8	a,
		uint16x8	b
	)

inline

Multiplies unsigned 16-bit values in the lower halves of the vectors and expands the results to 32 bits.

128-bit version:: r0 = a0 * b0

...

r3 = a3 * b3

In SSE2-AVX2 and ALTIVEC this intrinsic results in at least 2-3 instructions.

256-bit version:: The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.

In SSE2-AVX and ALTIVEC this intrinsic results in at least 4-6 instructions.
In AVX2 this intrinsic results in at least 2-3 instructions.
In NEON this intrinsic results in at least 2 instructions.
Note
Use with mull_hi on the same arguments to save instructions.

uint32x8 simdpp::mull_lo	(	uint16x16	a,
		uint16x16	b
	)

inline

Multiplies unsigned 16-bit values in the lower halves of the vectors and expands the results to 32 bits.

128-bit version:: r0 = a0 * b0

...

r3 = a3 * b3

In SSE2-AVX2 and ALTIVEC this intrinsic results in at least 2-3 instructions.

256-bit version:: The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.

In SSE2-AVX and ALTIVEC this intrinsic results in at least 4-6 instructions.
In AVX2 this intrinsic results in at least 2-3 instructions.
In NEON this intrinsic results in at least 2 instructions.
Note
Use with mull_hi on the same arguments to save instructions.

int64x2 simdpp::mull_lo	(	int32x4	a,
		int32x4	b
	)

inline

Multiplies signed 32-bit values in the lower halves of the vectors and expands the results to 64 bits.

128-bit version:

r0 = a0 * b0

r1 = a1 * b1

In SSE4.1-AVX this intrinsic results in at least 3 instructions.
Not implemented for SSE2-SSSE3 and ALTIVEC.

256-bit version:: The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.

In SSE4.1-AVX this intrinsic results in at least 6 instructions.
In AVX2 this intrinsic results in at least 3 instructions.
In NEON this intrinsic results in at least 2 instructions.
Not implemented for SSE2-SSSE3 and ALTIVEC.

int64x4 simdpp::mull_lo	(	int32x8	a,
		int32x8	b
	)

inline

Multiplies signed 32-bit values in the lower halves of the vectors and expands the results to 64 bits.

128-bit version:

r0 = a0 * b0

r1 = a1 * b1

In SSE4.1-AVX this intrinsic results in at least 3 instructions.
Not implemented for SSE2-SSSE3 and ALTIVEC.

256-bit version:: The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.

In SSE4.1-AVX this intrinsic results in at least 6 instructions.
In AVX2 this intrinsic results in at least 3 instructions.
In NEON this intrinsic results in at least 2 instructions.
Not implemented for SSE2-SSSE3 and ALTIVEC.

uint64x2 simdpp::mull_lo	(	uint32x4	a,
		uint32x4	b
	)

inline

Multiplies unsigned 32-bit values in the lower halves of the vectors and expands the results to 64 bits.

128-bit version:

r0 = a0 * b0

r1 = a1 * b1

In SSE2-AVX this intrinsic results in at least 3 instructions.
Not implemented for ALTIVEC.

256-bit version:: The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.

In SSE2-AVX this intrinsic results in at least 6 instructions.
In AVX2 this intrinsic results in at least 3 instructions.
In NEON this intrinsic results in at least 2 instructions.
Not implemented for ALTIVEC.

uint64x4 simdpp::mull_lo	(	uint32x8	a,
		uint32x8	b
	)

inline

Multiplies unsigned 32-bit values in the lower halves of the vectors and expands the results to 64 bits.

128-bit version:

r0 = a0 * b0

r1 = a1 * b1

In SSE2-AVX this intrinsic results in at least 3 instructions.
Not implemented for ALTIVEC.

256-bit version:: The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.

In SSE2-AVX this intrinsic results in at least 6 instructions.
In AVX2 this intrinsic results in at least 3 instructions.
In NEON this intrinsic results in at least 2 instructions.
Not implemented for ALTIVEC.

int8x16 simdpp::neg ( int8x16 a )

inline

Negates signed 8-bit values.

r0 = -a0
...
rN = -aN

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

int8x32 simdpp::neg ( int8x32 a )

inline

Negates signed 8-bit values.

r0 = -a0
...
rN = -aN

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

int16x8 simdpp::neg ( int16x8 a )

inline

Negates signed 16-bit values.

r0 = -a0
...
rN = -aN

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

int16x16 simdpp::neg ( int16x16 a )

inline

Negates signed 16-bit values.

r0 = -a0
...
rN = -aN

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

int32x4 simdpp::neg ( int32x4 a )

inline

Negates signed 32-bit values.

r0 = -a0
...
rN = -aN

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

int32x8 simdpp::neg ( int32x8 a )

inline

Negates signed 32-bit values.

r0 = -a0
...
rN = -aN

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

int64x2 simdpp::neg ( int64x2 a )

inline

Negates signed 64-bit values.

r0 = -a0
...
rN = -aN

128-bit version:

In ALTIVEC this intrinsic results in at least 4-5 instructions.

256-bit version:

In SSE2-AVX and NEON this intrinsic results in at least 2 instructions.
In ALTIVEC this intrinsic results in at least 8-9 instructions.

int64x4 simdpp::neg ( int64x4 a )

inline

Negates signed 64-bit values.

r0 = -a0
...
rN = -aN

128-bit version:

In ALTIVEC this intrinsic results in at least 4-5 instructions.

256-bit version:

In SSE2-AVX and NEON this intrinsic results in at least 2 instructions.
In ALTIVEC this intrinsic results in at least 8-9 instructions.

basic_int8x16 simdpp::sub	(	basic_int8x16	a,
		basic_int8x16	b
	)

inline

Subtracts 8-bit integer values.

r0 = a0 - b0
...
rN = aN - bN

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

basic_int8x32 simdpp::sub	(	basic_int8x32	a,
		basic_int8x32	b
	)

inline

Subtracts 8-bit integer values.

r0 = a0 - b0
...
rN = aN - bN

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

basic_int16x8 simdpp::sub	(	basic_int16x8	a,
		basic_int16x8	b
	)

inline

Subtracts 16-bit integer values.

r0 = a0 - b0
...
rN = aN - bN

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

basic_int16x16 simdpp::sub	(	basic_int16x16	a,
		basic_int16x16	b
	)

inline

Subtracts 16-bit integer values.

r0 = a0 - b0
...
rN = aN - bN

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

basic_int32x4 simdpp::sub	(	basic_int32x4	a,
		basic_int32x4	b
	)

inline

Subtracts 32-bit integer values.

r0 = a0 - b0
...
rN = aN - bN

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

basic_int32x8 simdpp::sub	(	basic_int32x8	a,
		basic_int32x8	b
	)

inline

Subtracts 32-bit integer values.

r0 = a0 - b0
...
rN = aN - bN

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

basic_int64x2 simdpp::sub	(	basic_int64x2	a,
		basic_int64x2	b
	)

inline

Subtracts 64-bit integer values.

r0 = a0 - b0
...
rN = aN - bN

128-bit version:

In ALTIVEC this intrinsic results in at least 5-6 instructions.

256-bit version:

In SSE2-AVX and NEON this intrinsic results in at least 2 instructions.
In ALTIVEC this intrinsic results in at least 10-11 instructions.

basic_int64x4 simdpp::sub	(	basic_int64x4	a,
		basic_int64x4	b
	)

inline

Subtracts 64-bit integer values.

r0 = a0 - b0
...
rN = aN - bN

128-bit version:

In ALTIVEC this intrinsic results in at least 5-6 instructions.

256-bit version:

In SSE2-AVX and NEON this intrinsic results in at least 2 instructions.
In ALTIVEC this intrinsic results in at least 10-11 instructions.

int8x16 simdpp::subs	(	int8x16	a,
		int8x16	b
	)

inline

Subtracts and saturaters signed 8-bit integer values.

r0 = saturated(a0 - b0)
...
rN = saturated(aN - bN)

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

int8x32 simdpp::subs	(	int8x32	a,
		int8x32	b
	)

inline

Subtracts and saturaters signed 8-bit integer values.

r0 = saturated(a0 - b0)
...
rN = saturated(aN - bN)

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

int16x8 simdpp::subs	(	int16x8	a,
		int16x8	b
	)

inline

Subtracts and saturaters signed 16-bit integer values.

r0 = saturated(a0 - b0)
...
rN = saturated(aN - bN)

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

int16x16 simdpp::subs	(	int16x16	a,
		int16x16	b
	)

inline

Subtracts and saturaters signed 16-bit integer values.

r0 = saturated(a0 - b0)
...
rN = saturated(aN - bN)

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

uint8x16 simdpp::subs	(	uint8x16	a,
		uint8x16	b
	)

inline

Subtracts and saturaters unsigned 8-bit integer values.

r0 = saturated(a0 - b0)
...
rN = saturated(aN - bN)

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

uint8x32 simdpp::subs	(	uint8x32	a,
		uint8x32	b
	)

inline

Subtracts and saturaters unsigned 8-bit integer values.

r0 = saturated(a0 - b0)
...
rN = saturated(aN - bN)

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

uint16x8 simdpp::subs	(	uint16x8	a,
		uint16x8	b
	)

inline

Subtracts and saturaters unsigned 16-bit integer values.

r0 = saturated(a0 - b0)
...
rN = saturated(aN - bN)

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

uint16x16 simdpp::subs	(	uint16x16	a,
		uint16x16	b
	)

inline

Subtracts and saturaters unsigned 16-bit integer values.

r0 = saturated(a0 - b0)
...
rN = saturated(aN - bN)

256-bit version:

In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 2 instructions.

Functions

Detailed Description

Function Documentation