Functions
int128	simdpp::load (int128 &a, const void *p)
	Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an aligned memory location. More...

int256	simdpp::load (int256 &a, const void *p)

float32x4	simdpp::load (float32x4 &a, const float *p)

float32x8	simdpp::load (float32x8 &a, const float *p)

float64x2	simdpp::load (float64x2 &a, const double *p)

float64x4	simdpp::load (float64x4 &a, const double *p)

basic_int8x16	simdpp::load_u (basic_int8x16 &a, const void *p)
	Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...

basic_int16x8	simdpp::load_u (basic_int16x8 &a, const void *p)
	Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...

basic_int32x4	simdpp::load_u (basic_int32x4 &a, const void *p)
	Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...

basic_int64x2	simdpp::load_u (basic_int64x2 &a, const void *p)
	Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...

float32x4	simdpp::load_u (float32x4 &a, const float *p)
	Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...

float64x2	simdpp::load_u (float64x2 &a, const double *p)
	Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...

basic_int8x32	simdpp::load_u (basic_int8x32 &a, const void *p)
	Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...

basic_int16x16	simdpp::load_u (basic_int16x16 &a, const void *p)
	Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...

basic_int32x8	simdpp::load_u (basic_int32x8 &a, const void *p)
	Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...

basic_int64x4	simdpp::load_u (basic_int64x4 &a, const void *p)
	Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...

float32x8	simdpp::load_u (float32x8 &a, const float *p)
	Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...

float64x4	simdpp::load_u (float64x4 &a, const double *p)
	Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...

void	simdpp::load_packed2 (basic_int8x16 &a, basic_int8x16 &b, const void *p)
	Loads 8-bit values packed in pairs, de-interleaves them and stores the result into two vectors. More...

void	simdpp::load_packed2 (basic_int8x32 &a, basic_int8x32 &b, const void *p)
	Loads 8-bit values packed in pairs, de-interleaves them and stores the result into two vectors. More...

void	simdpp::load_packed2 (basic_int16x8 &a, basic_int16x8 &b, const void *p)
	Loads 16-bit values packed in pairs, de-interleaves them and stores the result into two vectors. More...

void	simdpp::load_packed2 (basic_int16x16 &a, basic_int16x16 &b, const void *p)
	Loads 16-bit values packed in pairs, de-interleaves them and stores the result into two vectors. More...

void	simdpp::load_packed2 (basic_int32x4 &a, basic_int32x4 &b, const void *p)
	Loads 32-bit values packed in pairs, de-interleaves them and stores the result into two vectors. More...

void	simdpp::load_packed2 (basic_int32x8 &a, basic_int32x8 &b, const void *p)
	Loads 32-bit values packed in pairs, de-interleaves them and stores the result into two vectors. More...

void	simdpp::load_packed2 (basic_int64x2 &a, basic_int64x2 &b, const void *p)
	Loads 64-bit values packed in pairs, de-interleaves them and stores the result into two vectors. More...

void	simdpp::load_packed2 (basic_int64x4 &a, basic_int64x4 &b, const void *p)
	Loads 64-bit values packed in pairs, de-interleaves them and stores the result into two vectors. More...

Detailed Description

Function Documentation

int128 simdpp::load	(	int128 &	a,
		const void *	p
	)

inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an aligned memory location.

128-bit version:

a[0..127] = *(p)

p must be aligned to 16 bytes.

256-bit version:

a[0..255] = *(p)

p must be aligned to 32 bytes.

In SSE2-SSE4.1, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
In AVX (integer vectors) this intrinsic results in at least 2 instructions.

int256 simdpp::load	(	int256 &	a,
		const void *	p
	)

inline

float32x4 simdpp::load	(	float32x4 &	a,
		const float *	p
	)

inline

float32x8 simdpp::load	(	float32x8 &	a,
		const float *	p
	)

inline

float64x2 simdpp::load	(	float64x2 &	a,
		const double *	p
	)

inline

float64x4 simdpp::load	(	float64x4 &	a,
		const double *	p
	)

inline

void simdpp::load_packed2	(	basic_int8x16 &	a,
		basic_int8x16 &	b,
		const void *	p
	)

inline

Loads 8-bit values packed in pairs, de-interleaves them and stores the result into two vectors.

128-bit version:: a = [ *(p), *(p+2), *(p+4), ... , *(p+30) ]

b = [ *(p+1), *(p+3), *(p+5), ... , *(p+31) ]

p must be aligned to 16 bytes.

256-bit version:: a = [ *(p), *(p+2), *(p+4), ... , *(p+62) ]

b = [ *(p+1), *(p+3), *(p+5), ... , *(p+63) ]

p must be aligned to 32 bytes.

void simdpp::load_packed2	(	basic_int8x32 &	a,
		basic_int8x32 &	b,
		const void *	p
	)

inline

Loads 8-bit values packed in pairs, de-interleaves them and stores the result into two vectors.

128-bit version:: a = [ *(p), *(p+2), *(p+4), ... , *(p+30) ]

b = [ *(p+1), *(p+3), *(p+5), ... , *(p+31) ]

p must be aligned to 16 bytes.

256-bit version:: a = [ *(p), *(p+2), *(p+4), ... , *(p+62) ]

b = [ *(p+1), *(p+3), *(p+5), ... , *(p+63) ]

p must be aligned to 32 bytes.

void simdpp::load_packed2	(	basic_int16x8 &	a,
		basic_int16x8 &	b,
		const void *	p
	)

inline

Loads 16-bit values packed in pairs, de-interleaves them and stores the result into two vectors.

128-bit version:: a = [ *(p), *(p+2), *(p+4), ... , *(p+14) ]

b = [ *(p+1), *(p+3), *(p+5), ... , *(p+15) ]

p must be aligned to 16 bytes.

256-bit version:: a = [ *(p), *(p+2), *(p+4), ... , *(p+30) ]

b = [ *(p+1), *(p+3), *(p+5), ... , *(p+31) ]

p must be aligned to 32 bytes.

void simdpp::load_packed2	(	basic_int16x16 &	a,
		basic_int16x16 &	b,
		const void *	p
	)

inline

Loads 16-bit values packed in pairs, de-interleaves them and stores the result into two vectors.

128-bit version:: a = [ *(p), *(p+2), *(p+4), ... , *(p+14) ]

b = [ *(p+1), *(p+3), *(p+5), ... , *(p+15) ]

p must be aligned to 16 bytes.

256-bit version:: a = [ *(p), *(p+2), *(p+4), ... , *(p+30) ]

b = [ *(p+1), *(p+3), *(p+5), ... , *(p+31) ]

p must be aligned to 32 bytes.

void simdpp::load_packed2	(	basic_int32x4 &	a,
		basic_int32x4 &	b,
		const void *	p
	)

inline

Loads 32-bit values packed in pairs, de-interleaves them and stores the result into two vectors.

128-bit version:: a = [ *(p), *(p+2), *(p+4), *(p+6) ]

b = [ *(p+1), *(p+3), *(p+5), *(p+7) ]

p must be aligned to 16 bytes.

256-bit version:: a = [ *(p), *(p+2), *(p+4), ... , *(p+14) ]

b = [ *(p+1), *(p+3), *(p+5), ... , *(p+15) ]

p must be aligned to 32 bytes.

void simdpp::load_packed2	(	basic_int32x8 &	a,
		basic_int32x8 &	b,
		const void *	p
	)

inline

Loads 32-bit values packed in pairs, de-interleaves them and stores the result into two vectors.

128-bit version:: a = [ *(p), *(p+2), *(p+4), *(p+6) ]

b = [ *(p+1), *(p+3), *(p+5), *(p+7) ]

p must be aligned to 16 bytes.

256-bit version:: a = [ *(p), *(p+2), *(p+4), ... , *(p+14) ]

b = [ *(p+1), *(p+3), *(p+5), ... , *(p+15) ]

p must be aligned to 32 bytes.

void simdpp::load_packed2	(	basic_int64x2 &	a,
		basic_int64x2 &	b,
		const void *	p
	)

inline

Loads 64-bit values packed in pairs, de-interleaves them and stores the result into two vectors.

128-bit version:: a = [ *(p), *(p+2) ]

b = [ *(p+1), *(p+3) ]

p must be aligned to 16 bytes.

256-bit version:: a = [ *(p), *(p+2), *(p+4), *(p+14) ]

b = [ *(p+1), *(p+3), *(p+5), *(p+15) ]

p must be aligned to 32 bytes.

void simdpp::load_packed2	(	basic_int64x4 &	a,
		basic_int64x4 &	b,
		const void *	p
	)

inline

Loads 64-bit values packed in pairs, de-interleaves them and stores the result into two vectors.

128-bit version:: a = [ *(p), *(p+2) ]

b = [ *(p+1), *(p+3) ]

p must be aligned to 16 bytes.

256-bit version:: a = [ *(p), *(p+2), *(p+4), *(p+14) ]

b = [ *(p+1), *(p+3), *(p+5), *(p+15) ]

p must be aligned to 32 bytes.

basic_int8x16 simdpp::load_u	(	basic_int8x16 &	a,
		const void *	p
	)

inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:

a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

In ALTIVEC this intrinsic results in at least 4 instructions.

256-bit version:

a[0..255] = *(p)

p must be aligned to 32 bytes.

In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

basic_int16x8 simdpp::load_u	(	basic_int16x8 &	a,
		const void *	p
	)

inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:

a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

In ALTIVEC this intrinsic results in at least 4 instructions.

256-bit version:

a[0..255] = *(p)

p must be aligned to 32 bytes.

In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

basic_int32x4 simdpp::load_u	(	basic_int32x4 &	a,
		const void *	p
	)

inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:

a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

In ALTIVEC this intrinsic results in at least 4 instructions.

256-bit version:

a[0..255] = *(p)

p must be aligned to 32 bytes.

In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

basic_int64x2 simdpp::load_u	(	basic_int64x2 &	a,
		const void *	p
	)

inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:

a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

In ALTIVEC this intrinsic results in at least 4 instructions.

256-bit version:

a[0..255] = *(p)

p must be aligned to 32 bytes.

In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

float32x4 simdpp::load_u	(	float32x4 &	a,
		const float *	p
	)

inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:

a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

In ALTIVEC this intrinsic results in at least 4 instructions.

256-bit version:

a[0..255] = *(p)

p must be aligned to 32 bytes.

In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

float64x2 simdpp::load_u	(	float64x2 &	a,
		const double *	p
	)

inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:

a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

In ALTIVEC this intrinsic results in at least 4 instructions.

256-bit version:

a[0..255] = *(p)

p must be aligned to 32 bytes.

In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

basic_int8x32 simdpp::load_u	(	basic_int8x32 &	a,
		const void *	p
	)

inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:

a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

In ALTIVEC this intrinsic results in at least 4 instructions.

256-bit version:

a[0..255] = *(p)

p must be aligned to 32 bytes.

In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

basic_int16x16 simdpp::load_u	(	basic_int16x16 &	a,
		const void *	p
	)

inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:

a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

In ALTIVEC this intrinsic results in at least 4 instructions.

256-bit version:

a[0..255] = *(p)

p must be aligned to 32 bytes.

In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

basic_int32x8 simdpp::load_u	(	basic_int32x8 &	a,
		const void *	p
	)

inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:

a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

In ALTIVEC this intrinsic results in at least 4 instructions.

256-bit version:

a[0..255] = *(p)

p must be aligned to 32 bytes.

In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

basic_int64x4 simdpp::load_u	(	basic_int64x4 &	a,
		const void *	p
	)

inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:

a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

In ALTIVEC this intrinsic results in at least 4 instructions.

256-bit version:

a[0..255] = *(p)

p must be aligned to 32 bytes.

In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

float32x8 simdpp::load_u	(	float32x8 &	a,
		const float *	p
	)

inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:

a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

In ALTIVEC this intrinsic results in at least 4 instructions.

256-bit version:

a[0..255] = *(p)

p must be aligned to 32 bytes.

In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

float64x4 simdpp::load_u	(	float64x4 &	a,
		const double *	p
	)

inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:

a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

In ALTIVEC this intrinsic results in at least 4 instructions.

256-bit version:

a[0..255] = *(p)

p must be aligned to 32 bytes.

In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

Functions

Detailed Description

Function Documentation