Floating-point Load Operations for Streaming SIMD Extensions 2

The following load operation intrinsics and their respective instructions are functional in the Streaming SIMD Extensions 2 (SSE2).

The load and set operations are similar in that both initialize __m128d data. However, the set operations take a double argument and are intended for initialization with constants, while the load operations take a double pointer argument and are intended to mimic the instructions for loading data from memory.

For detailed information about an intrinsic, click on that intrinsic name in the following table.

The results of each intrinsic operation are placed in registers. The information about what is placed in each register appears in the tables below, in the detailed explanation of each intrinsic. R0 and R1 represent the registers in which results are placed.

The prototypes for SSE2 intrinsics are in the emmintrin.h header file.

The Double Complex code sample contains examples of how to use several of these intrinsics.

Intrinsic
Name
Operation Corresponding SSE2
Instruction
_mm_load_pd Loads two DP FP values MOVAPD
_mm_load1_pd Loads a single DP FP value, copying to both elements MOVSD + shuffling
_mm_loadr_pd Loads two DP FP values in reverse order MOVAPD + shuffling
_mm_loadu_pd Loads two DP FP values MOVUPD
_mm_load_sd Loads a DP FP value, sets upper DP FP  to zero MOVSD
_mm_loadh_pd Loads a DP FP value as the upper DP FP value of the result MOVHPD
_mm_loadl_pd Loads a DP FP value as the lower DP FP value of the result MOVLPD

 

__m128d _mm_load_pd(double const*dp)

Loads two DP FP values. The address p must be 16-byte aligned.

R0 R1
p[0] p[1]

 

__m128d _mm_load1_pd(double const*dp)

 Loads a single DP FP value, copying to both elements. The address p need not be 16-byte aligned.

R0 R1
*p *p

 

__m128d _mm_loadr_pd(double const*dp)

Loads two DP FP values in reverse order. The address p must be 16-byte aligned.

R0 R1
p[1] p[0]

 

__m128d _mm_loadu_pd(double const*dp)

Loads two DP FP values. The address p need not be 16-byte aligned.

R0 R1
p[0] p[1]

 

__m128d _mm_load_sd(double const*dp)

Loads a DP FP value. The upper DP FP is set to zero. The address p need not be 16-byte aligned.

R0 R1
*p 0.0

 

__m128d _mm_loadh_pd(__m128d a, double const*dp)

Loads a DP FP value as the upper DP FP value of the result. The lower DP FP value is passed through from a. The address p need not be 16-byte aligned.

R0 R1
a0 *p

 

__m128d _mm_loadl_pd(__m128d a, double const*dp)

Loads a DP FP value as the lower DP FP value of the result. The upper DP FP value is passed through from a. The address p need not be 16-byte aligned.

R0 R1
*p a1