Load Operations for Streaming SIMD Extensions

The prototypes for Streaming SIMD Extensions (SSE) intrinsics are in the xmmintrin.h header file.

To see detailed information about an intrinsic, click on that intrinsic name in the following table.

The results of each intrinsic operation are placed in a register. This register is illustrated for each intrinsic with R0-R3. R0, R1, R2 and R3 each represent one of the 4 32-bit pieces of the result register.

Intrinsic
Name
Operation Corresponding SSE
Instruction
_mm_loadh_pi Load high MOVHPS reg, mem
_mm_loadl_pi Load low MOVLPS reg, mem
_mm_load_ss Load the low value and clear the three high values MOVSS
_mm_load1_ps Load one value into all four words MOVSS + Shuffling
_mm_load_ps Load four values, address aligned MOVAPS
_mm_loadu_ps Load four values, address unaligned MOVUPS
_mm_loadr_ps Load four values in reverse MOVAPS + Shuffling

 

__m128 _mm_loadh_pi(__m128 a, __m64 const *p)

Sets the upper two SP FP values with 64 bits of data loaded from the address p.

R0 R1 R2 R3
a0 a1 *p0 *p1

 

__m128 _mm_loadl_pi(__m128 a, __m64 const *p)

Sets the lower two SP FP values with 64 bits of data loaded from the address p; the upper two values are passed through from a.

R0 R1 R2 R3
*p0 *p1 a2 a3

 

__m128 _mm_load_ss(float * p )

Loads an SP FP value into the low word and clears the upper three words.

R0 R1 R2 R3
*p 0.0 0.0 0.0

 

__m128 _mm_load1_ps(float * p )

Loads a single SP FP value, copying it into all four words.

R0 R1 R2 R3
*p *p *p *p

 

__m128 _mm_load_ps(float * p )

Loads four SP FP values. The address must be 16-byte-aligned.

R0 R1 R2 R3
p[0] p[1] p[2] p[3]

 

__m128 _mm_loadu_ps(float * p)

Loads four SP FP values. The address need not be 16-byte-aligned.

R0 R1 R2 R3
p[0] p[1] p[2] p[3]

 

__m128 _mm_loadr_ps(float * p)

Loads four SP FP values in reverse order. The address must be 16-byte-aligned.

R0 R1 R2 R3
p[3] p[2] p[1] p[0]