libsimdpp  0.9.3
Operations: load from memory to register

Functions

int128 simdpp::load (int128 &a, const void *p)
 Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an aligned memory location. More...
 
int256 simdpp::load (int256 &a, const void *p)
 
float32x4 simdpp::load (float32x4 &a, const float *p)
 
float32x8 simdpp::load (float32x8 &a, const float *p)
 
float64x2 simdpp::load (float64x2 &a, const double *p)
 
float64x4 simdpp::load (float64x4 &a, const double *p)
 
basic_int8x16 simdpp::load_u (basic_int8x16 &a, const void *p)
 Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...
 
basic_int16x8 simdpp::load_u (basic_int16x8 &a, const void *p)
 Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...
 
basic_int32x4 simdpp::load_u (basic_int32x4 &a, const void *p)
 Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...
 
basic_int64x2 simdpp::load_u (basic_int64x2 &a, const void *p)
 Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...
 
float32x4 simdpp::load_u (float32x4 &a, const float *p)
 Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...
 
float64x2 simdpp::load_u (float64x2 &a, const double *p)
 Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...
 
basic_int8x32 simdpp::load_u (basic_int8x32 &a, const void *p)
 Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...
 
basic_int16x16 simdpp::load_u (basic_int16x16 &a, const void *p)
 Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...
 
basic_int32x8 simdpp::load_u (basic_int32x8 &a, const void *p)
 Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...
 
basic_int64x4 simdpp::load_u (basic_int64x4 &a, const void *p)
 Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...
 
float32x8 simdpp::load_u (float32x8 &a, const float *p)
 Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...
 
float64x4 simdpp::load_u (float64x4 &a, const double *p)
 Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location. More...
 
void simdpp::load_packed2 (basic_int8x16 &a, basic_int8x16 &b, const void *p)
 Loads 8-bit values packed in pairs, de-interleaves them and stores the result into two vectors. More...
 
void simdpp::load_packed2 (basic_int8x32 &a, basic_int8x32 &b, const void *p)
 Loads 8-bit values packed in pairs, de-interleaves them and stores the result into two vectors. More...
 
void simdpp::load_packed2 (basic_int16x8 &a, basic_int16x8 &b, const void *p)
 Loads 16-bit values packed in pairs, de-interleaves them and stores the result into two vectors. More...
 
void simdpp::load_packed2 (basic_int16x16 &a, basic_int16x16 &b, const void *p)
 Loads 16-bit values packed in pairs, de-interleaves them and stores the result into two vectors. More...
 
void simdpp::load_packed2 (basic_int32x4 &a, basic_int32x4 &b, const void *p)
 Loads 32-bit values packed in pairs, de-interleaves them and stores the result into two vectors. More...
 
void simdpp::load_packed2 (basic_int32x8 &a, basic_int32x8 &b, const void *p)
 Loads 32-bit values packed in pairs, de-interleaves them and stores the result into two vectors. More...
 
void simdpp::load_packed2 (basic_int64x2 &a, basic_int64x2 &b, const void *p)
 Loads 64-bit values packed in pairs, de-interleaves them and stores the result into two vectors. More...
 
void simdpp::load_packed2 (basic_int64x4 &a, basic_int64x4 &b, const void *p)
 Loads 64-bit values packed in pairs, de-interleaves them and stores the result into two vectors. More...
 

Detailed Description

Function Documentation

int128 simdpp::load ( int128 &  a,
const void *  p 
)
inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an aligned memory location.

128-bit version:
a[0..127] = *(p)

p must be aligned to 16 bytes.

256-bit version:
a[0..255] = *(p)

p must be aligned to 32 bytes.

  • In SSE2-SSE4.1, NEON and ALTIVEC this intrinsic results in at least 2 instructions.
  • In AVX (integer vectors) this intrinsic results in at least 2 instructions.
int256 simdpp::load ( int256 &  a,
const void *  p 
)
inline
float32x4 simdpp::load ( float32x4 &  a,
const float *  p 
)
inline
float32x8 simdpp::load ( float32x8 &  a,
const float *  p 
)
inline
float64x2 simdpp::load ( float64x2 &  a,
const double *  p 
)
inline
float64x4 simdpp::load ( float64x4 &  a,
const double *  p 
)
inline
void simdpp::load_packed2 ( basic_int8x16 &  a,
basic_int8x16 &  b,
const void *  p 
)
inline

Loads 8-bit values packed in pairs, de-interleaves them and stores the result into two vectors.

128-bit version:
a = [ *(p), *(p+2), *(p+4), ... , *(p+30) ]
b = [ *(p+1), *(p+3), *(p+5), ... , *(p+31) ]
p must be aligned to 16 bytes.
256-bit version:
a = [ *(p), *(p+2), *(p+4), ... , *(p+62) ]
b = [ *(p+1), *(p+3), *(p+5), ... , *(p+63) ]
p must be aligned to 32 bytes.
void simdpp::load_packed2 ( basic_int8x32 &  a,
basic_int8x32 &  b,
const void *  p 
)
inline

Loads 8-bit values packed in pairs, de-interleaves them and stores the result into two vectors.

128-bit version:
a = [ *(p), *(p+2), *(p+4), ... , *(p+30) ]
b = [ *(p+1), *(p+3), *(p+5), ... , *(p+31) ]
p must be aligned to 16 bytes.
256-bit version:
a = [ *(p), *(p+2), *(p+4), ... , *(p+62) ]
b = [ *(p+1), *(p+3), *(p+5), ... , *(p+63) ]
p must be aligned to 32 bytes.
void simdpp::load_packed2 ( basic_int16x8 &  a,
basic_int16x8 &  b,
const void *  p 
)
inline

Loads 16-bit values packed in pairs, de-interleaves them and stores the result into two vectors.

128-bit version:
a = [ *(p), *(p+2), *(p+4), ... , *(p+14) ]
b = [ *(p+1), *(p+3), *(p+5), ... , *(p+15) ]
p must be aligned to 16 bytes.
256-bit version:
a = [ *(p), *(p+2), *(p+4), ... , *(p+30) ]
b = [ *(p+1), *(p+3), *(p+5), ... , *(p+31) ]
p must be aligned to 32 bytes.
void simdpp::load_packed2 ( basic_int16x16 &  a,
basic_int16x16 &  b,
const void *  p 
)
inline

Loads 16-bit values packed in pairs, de-interleaves them and stores the result into two vectors.

128-bit version:
a = [ *(p), *(p+2), *(p+4), ... , *(p+14) ]
b = [ *(p+1), *(p+3), *(p+5), ... , *(p+15) ]
p must be aligned to 16 bytes.
256-bit version:
a = [ *(p), *(p+2), *(p+4), ... , *(p+30) ]
b = [ *(p+1), *(p+3), *(p+5), ... , *(p+31) ]
p must be aligned to 32 bytes.
void simdpp::load_packed2 ( basic_int32x4 &  a,
basic_int32x4 &  b,
const void *  p 
)
inline

Loads 32-bit values packed in pairs, de-interleaves them and stores the result into two vectors.

128-bit version:
a = [ *(p), *(p+2), *(p+4), *(p+6) ]
b = [ *(p+1), *(p+3), *(p+5), *(p+7) ]
p must be aligned to 16 bytes.
256-bit version:
a = [ *(p), *(p+2), *(p+4), ... , *(p+14) ]
b = [ *(p+1), *(p+3), *(p+5), ... , *(p+15) ]
p must be aligned to 32 bytes.
void simdpp::load_packed2 ( basic_int32x8 &  a,
basic_int32x8 &  b,
const void *  p 
)
inline

Loads 32-bit values packed in pairs, de-interleaves them and stores the result into two vectors.

128-bit version:
a = [ *(p), *(p+2), *(p+4), *(p+6) ]
b = [ *(p+1), *(p+3), *(p+5), *(p+7) ]
p must be aligned to 16 bytes.
256-bit version:
a = [ *(p), *(p+2), *(p+4), ... , *(p+14) ]
b = [ *(p+1), *(p+3), *(p+5), ... , *(p+15) ]
p must be aligned to 32 bytes.
void simdpp::load_packed2 ( basic_int64x2 &  a,
basic_int64x2 &  b,
const void *  p 
)
inline

Loads 64-bit values packed in pairs, de-interleaves them and stores the result into two vectors.

128-bit version:
a = [ *(p), *(p+2) ]
b = [ *(p+1), *(p+3) ]
p must be aligned to 16 bytes.
256-bit version:
a = [ *(p), *(p+2), *(p+4), *(p+14) ]
b = [ *(p+1), *(p+3), *(p+5), *(p+15) ]
p must be aligned to 32 bytes.
void simdpp::load_packed2 ( basic_int64x4 &  a,
basic_int64x4 &  b,
const void *  p 
)
inline

Loads 64-bit values packed in pairs, de-interleaves them and stores the result into two vectors.

128-bit version:
a = [ *(p), *(p+2) ]
b = [ *(p+1), *(p+3) ]
p must be aligned to 16 bytes.
256-bit version:
a = [ *(p), *(p+2), *(p+4), *(p+14) ]
b = [ *(p+1), *(p+3), *(p+5), *(p+15) ]
p must be aligned to 32 bytes.
basic_int8x16 simdpp::load_u ( basic_int8x16 &  a,
const void *  p 
)
inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:
a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

  • In ALTIVEC this intrinsic results in at least 4 instructions.
256-bit version:
a[0..255] = *(p)

p must be aligned to 32 bytes.

  • In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
  • In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

basic_int16x8 simdpp::load_u ( basic_int16x8 &  a,
const void *  p 
)
inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:
a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

  • In ALTIVEC this intrinsic results in at least 4 instructions.
256-bit version:
a[0..255] = *(p)

p must be aligned to 32 bytes.

  • In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
  • In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

basic_int32x4 simdpp::load_u ( basic_int32x4 &  a,
const void *  p 
)
inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:
a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

  • In ALTIVEC this intrinsic results in at least 4 instructions.
256-bit version:
a[0..255] = *(p)

p must be aligned to 32 bytes.

  • In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
  • In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

basic_int64x2 simdpp::load_u ( basic_int64x2 &  a,
const void *  p 
)
inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:
a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

  • In ALTIVEC this intrinsic results in at least 4 instructions.
256-bit version:
a[0..255] = *(p)

p must be aligned to 32 bytes.

  • In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
  • In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

float32x4 simdpp::load_u ( float32x4 &  a,
const float *  p 
)
inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:
a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

  • In ALTIVEC this intrinsic results in at least 4 instructions.
256-bit version:
a[0..255] = *(p)

p must be aligned to 32 bytes.

  • In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
  • In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

float64x2 simdpp::load_u ( float64x2 &  a,
const double *  p 
)
inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:
a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

  • In ALTIVEC this intrinsic results in at least 4 instructions.
256-bit version:
a[0..255] = *(p)

p must be aligned to 32 bytes.

  • In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
  • In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

basic_int8x32 simdpp::load_u ( basic_int8x32 &  a,
const void *  p 
)
inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:
a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

  • In ALTIVEC this intrinsic results in at least 4 instructions.
256-bit version:
a[0..255] = *(p)

p must be aligned to 32 bytes.

  • In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
  • In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

basic_int16x16 simdpp::load_u ( basic_int16x16 &  a,
const void *  p 
)
inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:
a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

  • In ALTIVEC this intrinsic results in at least 4 instructions.
256-bit version:
a[0..255] = *(p)

p must be aligned to 32 bytes.

  • In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
  • In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

basic_int32x8 simdpp::load_u ( basic_int32x8 &  a,
const void *  p 
)
inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:
a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

  • In ALTIVEC this intrinsic results in at least 4 instructions.
256-bit version:
a[0..255] = *(p)

p must be aligned to 32 bytes.

  • In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
  • In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

basic_int64x4 simdpp::load_u ( basic_int64x4 &  a,
const void *  p 
)
inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:
a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

  • In ALTIVEC this intrinsic results in at least 4 instructions.
256-bit version:
a[0..255] = *(p)

p must be aligned to 32 bytes.

  • In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
  • In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

float32x8 simdpp::load_u ( float32x8 &  a,
const float *  p 
)
inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:
a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

  • In ALTIVEC this intrinsic results in at least 4 instructions.
256-bit version:
a[0..255] = *(p)

p must be aligned to 32 bytes.

  • In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
  • In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.

float64x4 simdpp::load_u ( float64x4 &  a,
const double *  p 
)
inline

Loads a 128-bit or 256-bit integer, 32-bit or 64-bit float vector from an unaligned memory location.

128-bit version:
a[0..127] = *(p)

p must be aligned to the element size. If p is aligned to 16 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 16-byte aligned 32-byte block may be accessed.

  • In ALTIVEC this intrinsic results in at least 4 instructions.
256-bit version:
a[0..255] = *(p)

p must be aligned to 32 bytes.

  • In SSE2-SSE4.1 and NEON this intrinsic results in at least 2 instructions.
  • In ALTIVEC this intrinsic results in at least 6 instructions.

p must be aligned to the element size. If p is aligned to 32 bytes only the referenced 16 byte block is accessed. Otherwise, memory within the smallest 32-byte aligned 64-byte block may be accessed.