Vectorization Examples

This section contains simple examples of some common issues in vector programming.

Argument Aliasing: A Vector Copy

The loop in the following example, a vector copy operation, vectorizes because the compiler can prove dest[i] and src[i] are distinct.

Example: Vectorizable Copy Due To Unproven Distinction

void vec_copy_multi_version(float *dest, float *src, int len)

{

  for (int i=0; i<len; i++)

    dest[i] = src[i];

}

The restrict keyword in the next example indicates that the pointers refer to distinct objects. Therefore, the compiler allows vectorization without generation of multi-version code.

Example: Using restrict to Prove Vectorizable Distinction

void vec_copy(float *restrict dest, float *restrict src, int len)

{

  for (int i=0; i<len; i++)

    dest[i] = src[i];

}

Data Alignment

A 16-byte (Linux* and Mac OS*) or 64-byte (Windows*) or greater data structure or array should be aligned so that the beginning of each structure or array element is aligned in a way that its base address is a multiple of 16 (Linux and Mac OS) or 32 (Windows).

The figure (below) shows the effect of a data cache unit (DCU) split due to misaligned data. The code loads the misaligned data across a 16-byte boundary, which results in an additional memory access causing a six- to twelve-cycle stall. You can avoid the stalls if you know that the data is aligned and you specify to assume alignment.

Misaligned Data Crossing 16- Byte Boundary

For example, if you know that elements a[0] and b[0] are aligned on a 16-byte boundary, then the following loop can be vectorized with the alignment option on (#pragma vector aligned):

Example: Alignment of Pointers is Known

void aligned(float *a, float *b, int len)

{

  for (int i=0; i<len; i++)

    a[i] = b[i];

}

After vectorization, the loop is executed as shown in figure below.

Vector and Scalar Clean-up iterations

Both the vector iterations a[0:3] = b[0:3]; and a[4:7] = b[4:7]; can be implemented with aligned moves if both the elements a[0] and b[0] (or, likewise, a[4] and b[4] ) are 16-byte aligned.

Caution

If you use the vectorizer with incorrect alignment options the compiler will generate code with unexpected behavior. Specifically, using aligned moves on unaligned data, will result in an illegal instruction exception.

Data Alignment Examples

This example contains a loop that vectorizes but only with unaligned memory instructions. The compiler can align the local arrays, but because lb is not known at compile-time. The correct alignment cannot be determined.

Example: Loop Unaligned Due to Unknown Variable Value at Compile Time

void unaligned(int lb, float *a, float x, float *y, int len)

{

  for (int i=lb; i<len; i++)

    a[i] = a[i] * x + y[i];

}

If you know that lb is a multiple of 4, you can align the loop with #pragma vector aligned as shown in the example that follows:

Example: Alignment Due to Assertion of Variable as Multiple of 4

#include <assert.h>

void assert_aligned(int lb, float *a, float x, float *y, int len)

{

  assert(lb%4 == 0);

  #pragma vector aligned

  for (int i=lb; i<len; i++)

    a[i] = a[i] * x + y[i];

}