libsimdpp
0.9.3
|
several vectors More...
Functions | |
void | simdpp::transpose2 (basic_int16x8 &a0, basic_int16x8 &a1) |
Transposes four 2x2 16-bit matrices within two int16x8 vectors. More... | |
void | simdpp::transpose2 (basic_int16x16 &a0, basic_int16x16 &a1) |
void | simdpp::transpose8 (basic_int8x16 &a0, basic_int8x16 &a1, basic_int8x16 &a2, basic_int8x16 &a3, basic_int8x16 &a4, basic_int8x16 &a5, basic_int8x16 &a6, basic_int8x16 &a7) |
Transposes two 8x8 8-bit matrices within eight int8x16 vectors. More... | |
void | simdpp::transpose8 (basic_int8x32 &a0, basic_int8x32 &a1, basic_int8x32 &a2, basic_int8x32 &a3, basic_int8x32 &a4, basic_int8x32 &a5, basic_int8x32 &a6, basic_int8x32 &a7) |
void | simdpp::transpose2 (basic_int32x4 &a0, basic_int32x4 &a1) |
Transposes two 2x2 32-bit matrices within two int32x4 vectors. More... | |
void | simdpp::transpose2 (basic_int32x8 &a0, basic_int32x8 &a1) |
Transposes two 2x2 32-bit matrices within two int32x4 vectors. More... | |
void | simdpp::transpose2 (basic_int64x2 &a0, basic_int64x2 &a1) |
Transposes a 2x2 64-bit matrix within two int64x2 vectors. More... | |
void | simdpp::transpose2 (basic_int64x4 &a0, basic_int64x4 &a1) |
Transposes a 2x2 64-bit matrix within two int64x2 vectors. More... | |
void | simdpp::transpose2 (float32x4 &a0, float32x4 &a1) |
Transposes two 2x2 32-bit matrices within two float32x4 vectors. More... | |
void | simdpp::transpose2 (float32x8 &a0, float32x8 &a1) |
Transposes two 2x2 32-bit matrices within two float32x4 vectors. More... | |
void | simdpp::transpose2 (float64x2 &a0, float64x2 &a1) |
Transposes a 2x2 64-bit matrix within two int64x2 vectors. More... | |
void | simdpp::transpose2 (float64x4 &a0, float64x4 &a1) |
Transposes a 2x2 64-bit matrix within two int64x2 vectors. More... | |
void | simdpp::transpose4 (basic_int8x16 &a0, basic_int8x16 &a1, basic_int8x16 &a2, basic_int8x16 &a3) |
Transposes four 4x4 8-bit matrix within four int8x16 vectors. More... | |
void | simdpp::transpose4 (basic_int32x8 &a0, basic_int32x8 &a1, basic_int32x8 &a2, basic_int32x8 &a3) |
Transposes four 4x4 8-bit matrix within four int8x16 vectors. More... | |
void | simdpp::transpose4 (basic_int8x32 &a0, basic_int8x32 &a1, basic_int8x32 &a2, basic_int8x32 &a3) |
Transposes four 4x4 8-bit matrix within four int8x16 vectors. More... | |
void | simdpp::transpose4 (basic_int16x8 &a0, basic_int16x8 &a1, basic_int16x8 &a2, basic_int16x8 &a3) |
Transposes two 4x4 16-bit matrices within four int16x8 vectors. More... | |
void | simdpp::transpose4 (basic_int16x16 &a0, basic_int16x16 &a1, basic_int16x16 &a2, basic_int16x16 &a3) |
Transposes two 4x4 16-bit matrices within four int16x8 vectors. More... | |
void | simdpp::transpose4 (basic_int32x4 &a0, basic_int32x4 &a1, basic_int32x4 &a2, basic_int32x4 &a3) |
Transposes a 4x4 32-bit matrix within four int32x4 vectors. More... | |
void | simdpp::transpose4 (float32x4 &a0, float32x4 &a1, float32x4 &a2, float32x4 &a3) |
Transposes 4x4 32-bit matrix within four float32x4 vectors. More... | |
void | simdpp::transpose4 (float32x8 &a0, float32x8 &a1, float32x8 &a2, float32x8 &a3) |
Transposes 4x4 32-bit matrix within four float32x4 vectors. More... | |
Detailed Description
several vectors
Function Documentation
|
inline |
Transposes four 2x2 16-bit matrices within two int16x8 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 2-4 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 8 instructions.
- In AVX2 this intrinsic results in at least 4 instructions.
- In NEON this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 4-6 instructions.
|
inline |
|
inline |
Transposes two 2x2 32-bit matrices within two int32x4 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 2-4 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 8 instructions.
- In AVX2 this intrinsic results in at least 4 instructions.
- In NEON this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 4-6 instructions.
|
inline |
Transposes two 2x2 32-bit matrices within two int32x4 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 2-4 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 8 instructions.
- In AVX2 this intrinsic results in at least 4 instructions.
- In NEON this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 4-6 instructions.
|
inline |
Transposes a 2x2 64-bit matrix within two int64x2 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 2-4 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 4 instructions.
- In AVX2 this intrinsic results in at least 2 instructions.
- In NEON this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 4-6 instructions.
|
inline |
Transposes a 2x2 64-bit matrix within two int64x2 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 2-4 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 4 instructions.
- In AVX2 this intrinsic results in at least 2 instructions.
- In NEON this intrinsic results in at least 2 instructions.
- In ALTIVEC this intrinsic results in at least 4-6 instructions.
|
inline |
Transposes two 2x2 32-bit matrices within two float32x4 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 2-4 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE4.1 this intrinsic results in at least 8 instructions.
- In AVX-AVX2 this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 4-6 instructions.
- In NEON this intrinsic results in at least 2 instructions.
|
inline |
Transposes two 2x2 32-bit matrices within two float32x4 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 2-4 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE4.1 this intrinsic results in at least 8 instructions.
- In AVX-AVX2 this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 4-6 instructions.
- In NEON this intrinsic results in at least 2 instructions.
|
inline |
Transposes a 2x2 64-bit matrix within two int64x2 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 2 instructions.
- Not vectorized in NEON and .
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE4.1 this intrinsic results in at least 4 instructions.
- In AVX-AVX2 this intrinsic results in at least 2 instructions.
- Not vectorized in NEON and .
|
inline |
Transposes a 2x2 64-bit matrix within two int64x2 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 2 instructions.
- Not vectorized in NEON and .
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-SSE4.1 this intrinsic results in at least 4 instructions.
- In AVX-AVX2 this intrinsic results in at least 2 instructions.
- Not vectorized in NEON and .
|
inline |
Transposes a 4x4 32-bit matrix within four int32x4 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 8-12 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 24 instructions.
- In AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 8 instructions.
- In ALTIVEC this intrinsic results in at least 16-20 instructions.
|
inline |
Transposes four 4x4 8-bit matrix within four int8x16 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 16 instructions.
- In NEON this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 8-12 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 32 instructions.
- In AVX2 this intrinsic results in at least 16 instructions.
- In NEON this intrinsic results in at least 8 instructions.
- In ALTIVEC this intrinsic results in at least 16-20 instructions.
|
inline |
Transposes four 4x4 8-bit matrix within four int8x16 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 16 instructions.
- In NEON this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 8-12 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 32 instructions.
- In AVX2 this intrinsic results in at least 16 instructions.
- In NEON this intrinsic results in at least 8 instructions.
- In ALTIVEC this intrinsic results in at least 16-20 instructions.
|
inline |
Transposes four 4x4 8-bit matrix within four int8x16 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 16 instructions.
- In NEON this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 8-12 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 32 instructions.
- In AVX2 this intrinsic results in at least 16 instructions.
- In NEON this intrinsic results in at least 8 instructions.
- In ALTIVEC this intrinsic results in at least 16-20 instructions.
|
inline |
Transposes two 4x4 16-bit matrices within four int16x8 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 8-12 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 24 instructions.
- In AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 8 instructions.
- In ALTIVEC this intrinsic results in at least 16-20 instructions.
|
inline |
Transposes two 4x4 16-bit matrices within four int16x8 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 8-12 instructions.
- 256-bit version:
- The lower and higher 128-bit halves are processed as if 128-bit instruction was applied to each of them separately.
- In SSE2-AVX this intrinsic results in at least 24 instructions.
- In AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 8 instructions.
- In ALTIVEC this intrinsic results in at least 16-20 instructions.
|
inline |
Transposes 4x4 32-bit matrix within four float32x4 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 8-12 instructions.
- 256-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 24 instructions.
- In AVX-AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 8 instructions.
- In ALTIVEC this intrinsic results in at least 16-20 instructions.
|
inline |
Transposes 4x4 32-bit matrix within four float32x4 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 4 instructions.
- In ALTIVEC this intrinsic results in at least 8-12 instructions.
- 256-bit version:
- In SSE2-SSE4.1 this intrinsic results in at least 24 instructions.
- In AVX-AVX2 this intrinsic results in at least 12 instructions.
- In NEON this intrinsic results in at least 8 instructions.
- In ALTIVEC this intrinsic results in at least 16-20 instructions.
|
inline |
Transposes two 8x8 8-bit matrices within eight int8x16 vectors.
- 128-bit version:
- In SSE2-AVX2 this intrinsic results in at least 32 instructions.
- In NEON this intrinsic results in at least 12 instructions.
- In ALTIVEC this intrinsic results in at least 24-30 instructions.
- 256-bit version:
- In SSE2-AVX this intrinsic results in at least 64 instructions.
- In AVX2 this intrinsic results in at least 32 instructions.
- In NEON this intrinsic results in at least 24 instructions.
- In ALTIVEC this intrinsic results in at least 48-54 instructions.
|
inline |
Generated on Thu Oct 31 2013 04:08:51 for libsimdpp by 1.8.3.1