Floating-point Load Operations for Streaming SIMD Extensions 2

The following load operation intrinsics and their respective instructions are functional in the Streaming SIMD Extensions 2 (SSE2).

The load and set operations are similar in that both initialize __m128d data. However, the set operations take a double argument and are intended for initialization with constants, while the load operations take a double pointer argument and are intended to mimic the instructions for loading data from memory.

For detailed information about an intrinsic, click on that intrinsic name in the following table.

The results of each intrinsic operation are placed in registers. The information about what is placed in each register appears in the tables below, in the detailed explanation of each intrinsic. R0 and R1 represent the registers in which results are placed.

The prototypes for SSE2 intrinsics are in the emmintrin.h header file.

The Double Complex code sample contains examples of how to use several of these intrinsics.

Intrinsic Name	Operation	Corresponding SSE2 Instruction
_mm_load_pd	Loads two DP FP values	MOVAPD
_mm_load1_pd	Loads a single DP FP value, copying to both elements	MOVSD + shuffling
_mm_loadr_pd	Loads two DP FP values in reverse order	MOVAPD + shuffling
_mm_loadu_pd	Loads two DP FP values	MOVUPD
_mm_load_sd	Loads a DP FP value, sets upper DP FP to zero	MOVSD
_mm_loadh_pd	Loads a DP FP value as the upper DP FP value of the result	MOVHPD
_mm_loadl_pd	Loads a DP FP value as the lower DP FP value of the result	MOVLPD

__m128d _mm_load_pd(double const*dp)

Loads two DP FP values. The address p must be 16-byte aligned.

R0	R1
p[0]	p[1]

__m128d _mm_load1_pd(double const*dp)

Loads a single DP FP value, copying to both elements. The address p need not be 16-byte aligned.

R0	R1
*p	*p

__m128d _mm_loadr_pd(double const*dp)

Loads two DP FP values in reverse order. The address p must be 16-byte aligned.

R0	R1
p[1]	p[0]

__m128d _mm_loadu_pd(double const*dp)

Loads two DP FP values. The address p need not be 16-byte aligned.

R0	R1
p[0]	p[1]

__m128d _mm_load_sd(double const*dp)

Loads a DP FP value. The upper DP FP is set to zero. The address p need not be 16-byte aligned.

R0	R1
*p	0.0

__m128d _mm_loadh_pd(__m128d a, double const*dp)

Loads a DP FP value as the upper DP FP value of the result. The lower DP FP value is passed through from a. The address p need not be 16-byte aligned.

R0	R1
a0	*p

__m128d _mm_loadl_pd(__m128d a, double const*dp)

Loads a DP FP value as the lower DP FP value of the result. The upper DP FP value is passed through from a. The address p need not be 16-byte aligned.

R0	R1
*p	a1