Details about Intrinsics

The MMX(TM) technology and Streaming SIMD Extension (SSE) instructions use the following features:

Registers--Enable packed data of up to 128 bits in length for optimal SIMD processing
Data Types--Enable packing of up to 16 elements of data in one register

Registers

Intel processors provide special register sets.

The MMX instructions use eight 64-bit registers (mm0 to mm7) which are aliased on the floating-point stack registers.

The Streaming SIMD Extensions use eight 128-bit registers (xmm0 to xmm7).

Because each of these registers can hold more than one data element, the processor can process more than one data element simultaneously. This processing capability is also known as single-instruction multiple data processing (SIMD).

For each computational and data manipulation instruction in the new extension sets, there is a corresponding C intrinsic that implements that instruction directly. This frees you from managing registers and assembly programming. Further, the compiler optimizes the instruction scheduling so that your executable runs faster.

Note

The MM and XMM registers are the SIMD registers used by the IA-32 platforms to implement MMX technology and SSE or SSE2 intrinsics. On the IA-64 architecture, the MMX and SSE intrinsics use the 64-bit general registers and the 64-bit significand of the 80-bit floating-point register.

Data Types

Intrinsic functions use four new C data types as operands, representing the new registers that are used as the operands to these intrinsic functions.

New Data Types Available

The following table details for which instructions each of the new data types are available.

New Data Type	MMX(TM) Technology	Streaming SIMD Extensions	Streaming SIMD Extensions 2	Streaming SIMD Extensions 3
__m64	Available	Available	Available	Available
__m128	Not available	Available	Available	Available
__m128d	Not available	Not available	Available	Available
__m128i	Not available	Not available	Available	Available

__m64 Data Type

The __m64 data type is used to represent the contents of an MMX register, which is the register that is used by the MMX technology intrinsics. The __m64 data type can hold eight 8-bit values, four 16-bit values, two 32-bit values, or one 64-bit value.

__m128 Data Types

The __m128 data type is used to represent the contents of a Streaming SIMD Extension register used by the Streaming SIMD Extension intrinsics. The __m128 data type can hold four 32-bit floating-point values.

The __m128d data type can hold two 64-bit floating-point values.

The __m128i data type can hold sixteen 8-bit, eight 16-bit, four 32-bit, or two 64-bit integer values.

The compiler aligns __m128d and _m128i local and global data to 16-byte boundaries on the stack. To align integer, float, or double arrays, you can use the declspec align statement.

Data Types Usage Guidelines

These data types are not basic ANSI C data types. You must observe the following usage restrictions:

Use data types only on either side of an assignment, as a return value, or as a parameter. You cannot use it with other arithmetic expressions (+, -, etc).
Use data types as objects in aggregates, such as unions, to access the byte elements and structures.
Use data types only with the respective intrinsics described in this documentation.

Accessing `__m128i` Data

To access 8-bit data:

#define _mm_extract_epi8(x, imm) \

((((imm) & 0x1) == 0) ? \

_mm_extract_epi16((x), (imm) >> 1) & 0xff : \

_mm_extract_epi16(_mm_srli_epi16((x), 8), (imm) >> 1))

For 16-bit data, use the following intrinsic:

int _mm_extract_epi16(__m128i a, int imm)

To access 32-bit data:

#define _mm_extract_epi32(x, imm) \

_mm_cvtsi128_si32(_mm_srli_si128((x), 4 * (imm)))

To access 64-bit data (Intel® 64 architecture only):

#define _mm_extract_epi64(x, imm) \

_mm_cvtsi128_si64(_mm_srli_si128((x), 8 * (imm)))