Functions
template<unsigned id>
basic_int8x16	simdpp::insert (basic_int8x16 a, uint8_t x)
	Inserts an element into int8x16 vector at the position identified by id. More...

template<unsigned id>
basic_int16x8	simdpp::insert (basic_int16x8 a, uint16_t x)
	Inserts an element into int16x8 vector at the position identified by id. More...

template<unsigned id>
basic_int32x4	simdpp::insert (basic_int32x4 a, uint32_t x)
	Inserts an element into int32x4 vector at the position identified by id. More...

template<unsigned id>
basic_int64x2	simdpp::insert (basic_int64x2 a, uint64_t x)
	Inserts an element into int64x2 vector at the position identified by id. More...

template<unsigned id>
float32x4	simdpp::insert (float32x4 a, float x)
	Inserts an element into float32x4 vector at the position identified by id. More...

template<unsigned id>
float64x2	simdpp::insert (float64x2 a, double x)
	Inserts an element into float64x2 vector at the position identified by id. More...

template<unsigned id>
uint8_t	simdpp::extract (basic_int8x16 a)
	Extracts the id-th element from int8x16 vector. More...

template<unsigned id>
int8_t	simdpp::extract (int8x16 a)
	Extracts the id-th element from int8x16 vector. More...

int256	simdpp::combine (int128 a, int128 b)
	Combines two 128-bit vectors into a 256-bit vector. More...

float32x8	simdpp::combine (float32x4 a, float32x4 b)
	Combines two 128-bit vectors into a 256-bit vector. More...

float64x4	simdpp::combine (float64x2 a, float64x2 b)
	Combines two 128-bit vectors into a 256-bit vector. More...

Detailed Description

Function Documentation

int256 simdpp::combine	(	int128	a,
		int128	b
	)

inline

Combines two 128-bit vectors into a 256-bit vector.

r = [ a, b ]

In AVX2 this intrinsic results in at least 1 instructions.
In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 0 instructions.

float32x8 simdpp::combine	(	float32x4	a,
		float32x4	b
	)

inline

Combines two 128-bit vectors into a 256-bit vector.

r = [ a, b ]

In AVX2 this intrinsic results in at least 1 instructions.
In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 0 instructions.

float64x4 simdpp::combine	(	float64x2	a,
		float64x2	b
	)

inline

Combines two 128-bit vectors into a 256-bit vector.

r = [ a, b ]

In AVX2 this intrinsic results in at least 1 instructions.
In SSE2-AVX, NEON and ALTIVEC this intrinsic results in at least 0 instructions.

template<unsigned id>

uint8_t simdpp::extract ( basic_int8x16 a )

Extracts the id-th element from int8x16 vector.

r = a[id]

This function may have very high latency.

In SSE2-SSSE3 this intrinsic results in at least 1-2 instructions.
In SSE4.1-AVX this intrinsic results in at least 1 instructions.
In ALTIVEC this intrinsic results in at least 2 instructions.

template<unsigned id>

int8_t simdpp::extract ( int8x16 a )

Extracts the id-th element from int8x16 vector.

r = a[id]

This function may have very high latency.

In SSE2-SSSE3 this intrinsic results in at least 1-2 instructions.
In SSE4.1-AVX this intrinsic results in at least 1 instructions.
In ALTIVEC this intrinsic results in at least 2 instructions.

template<unsigned id>

basic_int8x16 simdpp::insert	(	basic_int8x16	a,
		uint8_t	x
	)

Inserts an element into int8x16 vector at the position identified by id.

r0 = (id == 0) ? x : a0
...
r15 = (id == 15) ? x : a15

This function may have very high latency.

In SSE2-SSSE3 this intrinsic results in at least 4-5 instructions.
In ALTIVEC this intrinsic results in at least 3 instructions.

template<unsigned id>

basic_int16x8 simdpp::insert	(	basic_int16x8	a,
		uint16_t	x
	)

Inserts an element into int16x8 vector at the position identified by id.

r0 = (id == 0) ? x : a0
...
r7 = (id == 7) ? x : a7

This function may have very high latency.

In ALTIVEC this intrinsic results in at least 3 instructions.

template<unsigned id>

basic_int32x4 simdpp::insert	(	basic_int32x4	a,
		uint32_t	x
	)

Inserts an element into int32x4 vector at the position identified by id.

r0 = (id == 0) ? x : a0
r1 = (id == 1) ? x : a1
r2 = (id == 2) ? x : a2
r3 = (id == 3) ? x : a3

This function may have very high latency.

In SSE2-SSSE3 this intrinsic results in at least 4 instructions.
In ALTIVEC this intrinsic results in at least 3 instructions.

template<unsigned id>

basic_int64x2 simdpp::insert	(	basic_int64x2	a,
		uint64_t	x
	)

Inserts an element into int64x2 vector at the position identified by id.

r0 = (id == 0) ? x : a0

r1 = (id == 1) ? x : a1

This function may have very high latency.

In SSE2, SSE3 and SSSE3 this intrinsic results in at least 2 instructions.
In SSE4_1 this intrinsic results in at least 1 instructions.
In SSE2_32bit, SSE3_32bit and SSSE3_32bit this intrinsic results in at least 4 instructions.
In SSE4_1_32bit this intrinsic results in at least 2 instructions.
In ALTIVEC this intrinsic results in at least 3 instructions.

template<unsigned id>

float32x4 simdpp::insert	(	float32x4	a,
		float	x
	)

Inserts an element into float32x4 vector at the position identified by id.

r0 = (id == 0) ? x : a0
r1 = (id == 1) ? x : a1
r2 = (id == 2) ? x : a2
r3 = (id == 3) ? x : a3

This function may have very high latency.

In SSE2-SSSE3 this intrinsic results in at least 4 instructions.
In ALTIVEC this intrinsic results in at least 3 instructions.

template<unsigned id>

float64x2 simdpp::insert	(	float64x2	a,
		double	x
	)

Inserts an element into float64x2 vector at the position identified by id.

This function potentially

r0 = (id == 0) ? x : a0

r1 = (id == 1) ? x : a1

This function may have very high latency.

In SSE2-SSSE3 this intrinsic results in at least 2 instructions.
In ALTIVEC this intrinsic results in at least 3 instructions.

Functions

Detailed Description

Function Documentation