Targeting IA-32 Architecture Processors Manually

Using manual processor dispatch, your code can detect the processor based on IA-32 architecture at run time through the cpu_specific and cpu_dispatch keywords; you can write code that runs only on the targeted processor. Manual processor dispatch will not recognize processors based on IA-64 architecture.

Use the __declspec(cpu_specific) and __declspec(cpu_dispatch) syntax in your code to create code specific to a targeted Intel® processor and allow the remaining application to execute correctly on other IA-32 architecture processors.

The general syntax for these keywords change a function declaration by using the following arguments:

The following table lists the values for cpuid:

Argument for cpuid

Processors

core_2_duo_ssse3

Intel® Core™2 Duo processors and Intel® Xeon® processors with Supplemental Streaming SIMD Extensions 3 (SSSE3)

pentium_4_sse3

Intel Pentium 4 processor with Streaming SIMD Extensions 3 (SSE3), Intel® Core™ Duo processors, Intel® Core™ Solo processors

pentium_4

Intel Pentium 4 processors

pentium_m

Intel Pentium M processors

pentium_iii_no_xmm_regs

Intel Pentium III (exclude xmm registers)

pentium_iii

Intel Pentium III processors

pentium_ii

Intel Pentium II processors

pentium_pro

Intel Pentium Pro processors

pentium_mmx

Intel Pentium processors with MMX™ Technology

pentium

Intel® Pentium® processors

generic

x86 processors not provided by Intel Corporation

The following table lists the syntax for cpuid-list:

Syntax for cpuid-list

cpuid

cpuid-list, cpuid

The attributes are not case sensitive. The body of a function declared with __declspec(cpu_dispatch) must be empty, and is referred to as a stub (an empty-bodied function).

Manual processor dispatch can disable some types of inlining, almost always results in larger code and executable sizes, and can introduce additional performance overhead because of the additional function calls. Test your application on all of the targeted platforms before release. Before using manual dispatch, consider whether the benefits outweigh the additional effort and possible performance issues.

Use the following guidelines to implement processor dispatch support:

The following example demonstrates using manual dispatch with both cpu_specific and cpu_dispatch.

Example

#include <stdio.h>

#include <mmintrin.h>

/* Pentium processor function does not use intrinsics

   to add two arrays. */

__declspec(cpu_specific(pentium))

void array_sum1(int *result, int *a, int *b, size_t len)

{

  for (; len > 0; len--)

    *result++ = *a++ + *b++;

}

/* Implementation for a Pentium processor with MMX technology uses

   an MMX instruction intrinsic to add four elements simultaneously. */

__declspec(cpu_specific(pentium_MMX))

void array_sum2(int *result, int const *a, int *b, size_t len)

{

  __m64 *mmx_result = (__m64 *)result;

  __m64 const *mmx_a = (__m64 const *)a;

  __m64 const *mmx_b = (__m64 const *)b;

  for (; len > 3; len -= 4)

    *mmx_result++ = _mm_add_pi16(*mmx_a++, *mmx_b++);

  /* The following code, which takes care of excess elements, is not

     needed if the array sizes passed are known to be multiples of four. */

  result = (unsigned short *)mmx_result;

  a = (unsigned short const *)mmx_a;

  b = (unsigned short const *)mmx_b;

  for (; len > 0; len--)

    *result++ = *a++ + *b++;

}

__declspec(cpu_dispatch(pentium, pentium_MMX))

void array_sum3(int *result, int const *a, int *b, size_t len)

{

  /* Empty function body informs the compiler to generate the

     CPU-dispatch function listed in the cpu_dispatch clause. */

}