libsimdpp
1.0
|
several vectors More...
Functions | |
void | simdpp::transpose2 (uint16x8 &a0, uint16x8 &a1) |
Transposes four 2x2 16-bit matrices within two int16x8 vectors. More... | |
void | simdpp::transpose2 (int16x8 &a0, int16x8 &a1) |
void | simdpp::transpose2 (uint16x16 &a0, uint16x16 &a1) |
void | simdpp::transpose2 (int16x16 &a0, int16x16 &a1) |
void | simdpp::transpose8 (uint8x16 &a0, uint8x16 &a1, uint8x16 &a2, uint8x16 &a3, uint8x16 &a4, uint8x16 &a5, uint8x16 &a6, uint8x16 &a7) |
Transposes two 8x8 8-bit matrices within eight int8x16 vectors. More... | |
void | simdpp::transpose8 (int8x16 &a0, int8x16 &a1, int8x16 &a2, int8x16 &a3, int8x16 &a4, int8x16 &a5, int8x16 &a6, int8x16 &a7) |
void | simdpp::transpose8 (uint8x32 &a0, uint8x32 &a1, uint8x32 &a2, uint8x32 &a3, uint8x32 &a4, uint8x32 &a5, uint8x32 &a6, uint8x32 &a7) |
void | simdpp::transpose8 (int8x32 &a0, int8x32 &a1, int8x32 &a2, int8x32 &a3, int8x32 &a4, int8x32 &a5, int8x32 &a6, int8x32 &a7) |
void | simdpp::transpose2 (uint32x4 &a0, uint32x4 &a1) |
Transposes two 2x2 32-bit matrices within two int32x4 vectors. More... | |
void | simdpp::transpose2 (int32x4 &a0, int32x4 &a1) |
Transposes two 2x2 32-bit matrices within two int32x4 vectors. More... | |
void | simdpp::transpose2 (uint32x8 &a0, uint32x8 &a1) |
Transposes two 2x2 32-bit matrices within two int32x4 vectors. More... | |
void | simdpp::transpose2 (int32x8 &a0, int32x8 &a1) |
Transposes two 2x2 32-bit matrices within two int32x4 vectors. More... | |
void | simdpp::transpose2 (uint64x2 &a0, uint64x2 &a1) |
Transposes a 2x2 64-bit matrix within two int64x2 vectors. More... | |
void | simdpp::transpose2 (int64x2 &a0, int64x2 &a1) |
Transposes a 2x2 64-bit matrix within two int64x2 vectors. More... | |
void | simdpp::transpose2 (uint64x4 &a0, uint64x4 &a1) |
Transposes a 2x2 64-bit matrix within two int64x2 vectors. More... | |
void | simdpp::transpose2 (int64x4 &a0, int64x4 &a1) |
Transposes a 2x2 64-bit matrix within two int64x2 vectors. More... | |
void | simdpp::transpose2 (float32x4 &a0, float32x4 &a1) |
Transposes two 2x2 32-bit matrices within two float32x4 vectors. More... | |
void | simdpp::transpose2 (float32x8 &a0, float32x8 &a1) |
Transposes two 2x2 32-bit matrices within two float32x4 vectors. More... | |
void | simdpp::transpose2 (float64x2 &a0, float64x2 &a1) |
Transposes a 2x2 64-bit matrix within two int64x2 vectors. More... | |
void | simdpp::transpose2 (float64x4 &a0, float64x4 &a1) |
Transposes a 2x2 64-bit matrix within two int64x2 vectors. More... | |
void | simdpp::transpose4 (uint8x16 &a0, uint8x16 &a1, uint8x16 &a2, uint8x16 &a3) |
Transposes four 4x4 8-bit matrix within four int8x16 vectors. More... | |
void | simdpp::transpose4 (int8x16 &a0, int8x16 &a1, int8x16 &a2, int8x16 &a3) |
Transposes four 4x4 8-bit matrix within four int8x16 vectors. More... | |
void | simdpp::transpose4 (uint32x8 &a0, uint32x8 &a1, uint32x8 &a2, uint32x8 &a3) |
Transposes four 4x4 8-bit matrix within four int8x16 vectors. More... | |
void | simdpp::transpose4 (uint8x32 &a0, uint8x32 &a1, uint8x32 &a2, uint8x32 &a3) |
Transposes four 4x4 8-bit matrix within four int8x16 vectors. More... | |
void | simdpp::transpose4 (int8x32 &a0, int8x32 &a1, int8x32 &a2, int8x32 &a3) |
Transposes four 4x4 8-bit matrix within four int8x16 vectors. More... | |
void | simdpp::transpose4 (uint16x8 &a0, uint16x8 &a1, uint16x8 &a2, uint16x8 &a3) |
Transposes two 4x4 16-bit matrices within four int16x8 vectors. More... | |
void | simdpp::transpose4 (int16x8 &a0, int16x8 &a1, int16x8 &a2, int16x8 &a3) |
Transposes two 4x4 16-bit matrices within four int16x8 vectors. More... | |
void | simdpp::transpose4 (uint16x16 &a0, uint16x16 &a1, uint16x16 &a2, uint16x16 &a3) |
Transposes two 4x4 16-bit matrices within four int16x8 vectors. More... | |
void | simdpp::transpose4 (int16x16 &a0, int16x16 &a1, int16x16 &a2, int16x16 &a3) |
Transposes two 4x4 16-bit matrices within four int16x8 vectors. More... | |
void | simdpp::transpose4 (uint32x4 &a0, uint32x4 &a1, uint32x4 &a2, uint32x4 &a3) |
Transposes a 4x4 32-bit matrix within four int32x4 vectors. More... | |
void | simdpp::transpose4 (int32x4 &a0, int32x4 &a1, int32x4 &a2, int32x4 &a3) |
Transposes a 4x4 32-bit matrix within four int32x4 vectors. More... | |
void | simdpp::transpose4 (int32x8 &a0, int32x8 &a1, int32x8 &a2, int32x8 &a3) |
Transposes a 4x4 32-bit matrix within four int32x4 vectors. More... | |
void | simdpp::transpose4 (float32x4 &a0, float32x4 &a1, float32x4 &a2, float32x4 &a3) |
Transposes 4x4 32-bit matrix within four float32x4 vectors. More... | |
void | simdpp::transpose4 (float32x8 &a0, float32x8 &a1, float32x8 &a2, float32x8 &a3) |
Transposes 4x4 32-bit matrix within four float32x4 vectors. More... | |
Detailed Description
several vectors
Function Documentation
|
inline |
Transposes four 2x2 16-bit matrices within two int16x8 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 2-4 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 8 instructions.
- In AVX2 this intrinsic results in at least 4 instructions.
- In NEON this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 4-6 instructions.
|
inline |
|
inline |
|
inline |
|
inline |
Transposes two 2x2 32-bit matrices within two int32x4 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 2-4 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 8 instructions.
- In AVX2 this intrinsic results in at least 4 instructions.
- In NEON this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 4-6 instructions.
|
inline |
Transposes two 2x2 32-bit matrices within two int32x4 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 2-4 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 8 instructions.
- In AVX2 this intrinsic results in at least 4 instructions.
- In NEON this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 4-6 instructions.
|
inline |
Transposes two 2x2 32-bit matrices within two int32x4 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 2-4 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 8 instructions.
- In AVX2 this intrinsic results in at least 4 instructions.
- In NEON this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 4-6 instructions.
|
inline |
Transposes two 2x2 32-bit matrices within two int32x4 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 2-4 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 8 instructions.
- In AVX2 this intrinsic results in at least 4 instructions.
- In NEON this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 4-6 instructions.
|
inline |
Transposes a 2x2 64-bit matrix within two int64x2 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 2-4 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 4 instructions.
- In AVX2 this intrinsic results in at least 2 instructions.
- In NEON this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 4-6 instructions.
|
inline |
Transposes a 2x2 64-bit matrix within two int64x2 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 2-4 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 4 instructions.
- In AVX2 this intrinsic results in at least 2 instructions.
- In NEON this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 4-6 instructions.
|
inline |
Transposes a 2x2 64-bit matrix within two int64x2 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 2-4 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 4 instructions.
- In AVX2 this intrinsic results in at least 2 instructions.
- In NEON this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 4-6 instructions.
|
inline |
Transposes a 2x2 64-bit matrix within two int64x2 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 2-4 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 4 instructions.
- In AVX2 this intrinsic results in at least 2 instructions.
- In NEON this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 4-6 instructions.
|
inline |
Transposes two 2x2 32-bit matrices within two float32x4 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 2-4 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE4.1 this intrinsic results in at least 8 instructions.
- In AVX-AVX2 this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 4-6 instructions.
- In NEON this intrinsic results in at least 2 instructions.
|
inline |
Transposes two 2x2 32-bit matrices within two float32x4 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 2-4 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE4.1 this intrinsic results in at least 8 instructions.
- In AVX-AVX2 this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 4-6 instructions.
- In NEON this intrinsic results in at least 2 instructions.
|
inline |
Transposes a 2x2 64-bit matrix within two int64x2 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 2 instructions.
- Not vectorized in NEON and .
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE4.1 this intrinsic results in at least 4 instructions.
- In AVX-AVX2 this intrinsic results in at least 2 instructions.
- Not vectorized in NEON and .
|
inline |
Transposes a 2x2 64-bit matrix within two int64x2 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 2 instructions.
- Not vectorized in NEON and .
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE4.1 this intrinsic results in at least 4 instructions.
- In AVX-AVX2 this intrinsic results in at least 2 instructions.
- Not vectorized in NEON and .
|
inline |
Transposes a 4x4 32-bit matrix within four int32x4 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 8-12 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 24 instructions.
- In AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 8 instructions.
- In ALTIVEC this intrinsic results in at least 16-20 instructions.
|
inline |
Transposes four 4x4 8-bit matrix within four int8x16 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 16 instructions.
- In NEON this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 8-12 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 32 instructions.
- In AVX2 this intrinsic results in at least 16 instructions.
- In NEON this intrinsic results in at least 8 instructions.
- In ALTIVEC this intrinsic results in at least 16-20 instructions.
|
inline |
Transposes four 4x4 8-bit matrix within four int8x16 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 16 instructions.
- In NEON this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 8-12 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 32 instructions.
- In AVX2 this intrinsic results in at least 16 instructions.
- In NEON this intrinsic results in at least 8 instructions.
- In ALTIVEC this intrinsic results in at least 16-20 instructions.
|
inline |
Transposes four 4x4 8-bit matrix within four int8x16 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 16 instructions.
- In NEON this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 8-12 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 32 instructions.
- In AVX2 this intrinsic results in at least 16 instructions.
- In NEON this intrinsic results in at least 8 instructions.
- In ALTIVEC this intrinsic results in at least 16-20 instructions.
|
inline |
Transposes four 4x4 8-bit matrix within four int8x16 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 16 instructions.
- In NEON this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 8-12 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 32 instructions.
- In AVX2 this intrinsic results in at least 16 instructions.
- In NEON this intrinsic results in at least 8 instructions.
- In ALTIVEC this intrinsic results in at least 16-20 instructions.
|
inline |
Transposes four 4x4 8-bit matrix within four int8x16 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 16 instructions.
- In NEON this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 8-12 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 32 instructions.
- In AVX2 this intrinsic results in at least 16 instructions.
- In NEON this intrinsic results in at least 8 instructions.
- In ALTIVEC this intrinsic results in at least 16-20 instructions.
|
inline |
Transposes two 4x4 16-bit matrices within four int16x8 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 8-12 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 24 instructions.
- In AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 8 instructions.
- In ALTIVEC this intrinsic results in at least 16-20 instructions.
|
inline |
Transposes two 4x4 16-bit matrices within four int16x8 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 8-12 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 24 instructions.
- In AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 8 instructions.
- In ALTIVEC this intrinsic results in at least 16-20 instructions.
|
inline |
Transposes two 4x4 16-bit matrices within four int16x8 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 8-12 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 24 instructions.
- In AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 8 instructions.
- In ALTIVEC this intrinsic results in at least 16-20 instructions.
|
inline |
Transposes two 4x4 16-bit matrices within four int16x8 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 8-12 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 24 instructions.
- In AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 8 instructions.
- In ALTIVEC this intrinsic results in at least 16-20 instructions.
|
inline |
Transposes a 4x4 32-bit matrix within four int32x4 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 8-12 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 24 instructions.
- In AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 8 instructions.
- In ALTIVEC this intrinsic results in at least 16-20 instructions.
|
inline |
Transposes a 4x4 32-bit matrix within four int32x4 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 8-12 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 24 instructions.
- In AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 8 instructions.
- In ALTIVEC this intrinsic results in at least 16-20 instructions.
|
inline |
Transposes 4x4 32-bit matrix within four float32x4 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 8-12 instructions.
- 256-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 24 instructions.
- In AVX-AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 8 instructions.
- In ALTIVEC this intrinsic results in at least 16-20 instructions.
|
inline |
Transposes 4x4 32-bit matrix within four float32x4 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 8-12 instructions.
- 256-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 24 instructions.
- In AVX-AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 8 instructions.
- In ALTIVEC this intrinsic results in at least 16-20 instructions.
|
inline |
Transposes two 8x8 8-bit matrices within eight int8x16 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 32 instructions.
- In NEON this intrinsic results in at least 12 instructions.
- In ALTIVEC this intrinsic results in at least 24-30 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 64 instructions.
- In AVX2 this intrinsic results in at least 32 instructions.
- In NEON this intrinsic results in at least 24 instructions.
- In ALTIVEC this intrinsic results in at least 48-54 instructions.
|
inline |
|
inline |
|
inline |
Generated on Tue Apr 8 2014 03:14:34 for libsimdpp by
