Targeting IA-32 Architecture Processors for Run-time Performance Checking

The -ax (Linux* and Mac OS*) or /Qax (Windows*) option instructs the compiler to determine if opportunities exist to generate separate versions of functions that take advantage of features in a specific Intel® processor. If the compiler finds such an opportunity, the compiler first checks whether generating a processor-specific version of a function is likely to result in a performance gain. If there is a likely performance gain, the compiler generates both a processor-specific version of a function and a generic version of the function. The generic version will run on any processor based on IA-32 architecture.

At run time, one of the generated versions executes depending on the Intel processor detected. Using this strategy, the program can benefit from performance gains on more advanced Intel processors if they are used, while still working properly on older processors based on IA-32 architecture.

The optimizations can include generating Supplemental Streaming SIMD Extensions 3 (SSSE3), Streaming SIMD Extensions 3 (SSE3), Streaming SIMD Extensions 2 (SSE2), or Streaming SIMD Extensions (SSE) instructions for supported processors; however, the instructions are executed on the processor only after run-time checking verifies the instruction sets are supported.

The size of the compiled binary increases because it contains both a processor-specific version of some of the code and a generic version of all code. Application performance is affected slightly due to the run-time checks needed to determine which code to use.

Linux and Mac OS

Windows

Description

-axT

/QaxT

Can generate SSSE3, SSE3, SSE2 and SSE instructions for Intel processors and can generate optimized code for Intel® Xeon® processors based on the Intel® Core™ microarchitecture, Intel® Core™2 Duo processors, Intel® Core™2 Extreme processors, and other Intel processors based on the same architecture.

Mac OS: This is a supported value.

-axP

/QaxP

Can generate SSE3, SSE2 and SSE instructions for Intel processors and can generate optimized code for Intel® Core™ Duo processors, Intel® Core™ Solo processors,  Celeron® M processors, Celeron® D processors, Intel Pentium 4 processors with SSE3, and Intel® Xeon® processors not based on Intel® Core™ microarchitecture.

Use this option on processors based on Intel® 64 architecture.

Mac OS: This is a supported value.

-axW

 

/QaxW

 

Can generate SSE2 and SSE instructions for Intel processors and can generate optimized code for the Intel Pentium® 4 processor, and Intel® Xeon® processors with SSE2.

This is the default value for Intel® 64 architecture systems.

-axN

/QaxN

Can generate SSE2 and SSE instructions for Intel processors and can generate optimized code for the Pentium® 4 processor, and Intel® Xeon® processors with SSE2.

-axB

/QaxB

Can generate SSE2 and SSE instructions for Intel processors and can generate optimized code for Intel® Pentium® M processors.

This value is deprecated. If this is the first time you are going to use this value you should use N or W instead.

-axK

/QaxK

Can generate SSE instructions for Intel processors and can generate optimized code for Intel® Pentium® III and Intel Pentium® III Xeon® processors.

The N, B, P, and T processor values enable new optimizations in addition to Intel processor-specific optimizations. On Intel® 64 architecture, W, P, and T are the only valid processor values.

If your application does not need to run on multiple processors based on IA-32 or Intel® 64 architectures, consider using the -x (Linux and Mac OS) or /Qx (Windows) option or combining this option with the x option.

Combining the options allows the compiler to generate optimized code targeted for a specific processor, generate processor dispatch code that will run only on the targeted processor, and generic code that will run on any processor based on IA-32 architecture; however, if you specify both the options at the same time, the -x (Linux and Mac OS) or /Qx (Windows) option takes precedence and forces the generic code to execute only on processors compatible with the processor value specified as the minimum processor value.

The following compilation examples demonstrate how to generate an executable that includes an optimized version for Intel® Core™2 Duo processors, as long as there is a performance gain, an optimized version for Intel® Core™ Duo processors, as long as there is a performance gain, and a generic version that runs on any IA-32 architecture processor.

Platform

Example

Linux

ifort -axPT sample.f90

Windows

ifort /QaxPT sample.f90

See also:

Other Options for Generating Processor-Specific Optimized Applications

The -mtune (Linux and Mac OS) or /G{n} (Windows) option generates code that is backwards compatible with Intel® processors in the same processor family. This behavior means the code generated with -mtune=pentium4 (Linux and Mac OS) or /G7 (Windows) will run correctly on earlier processors based on IA-32 architecture; however, the code might not run as fast as if the code had been compiled with -mtune=pentium (Linux and Mac OS) or /G5 (Windows).

The following options can optimize application performance for specific processors based on IA-32 or Intel® 64 architectures.

Linux and Mac OS

Windows

Optimizes applications for...

-mtune=pentium4

/G7

Default. Intel® Pentium® 4 processors, Intel® Core™ Duo processors, Intel® Core™ Solo processors, Intel® Xeon® processors, Intel® Pentium® M processors, and Intel® Pentium® 4 processors with Streaming SIMD Extensions 3 (SSE3) instruction support

-mtune=pentiumpro

/G6

Intel® Pentium® Pro, Pentium® II and Pentium® III processors

-mtune=pentium

-mtune=pentium-mmx

/G5

Intel® Pentium® and Pentium® with MMX™ technology processor

Note

Windows: For this release, the /G5, /G6, and /G7 options have been deprecated but not removed.

See also:

The example commands shown below each result in a compiled binary of the source program sample.f90 optimized for Pentium 4 and Intel® Xeon® processors by default. The same binary will also run on Pentium, Pentium III, and more advanced processors. The following examples demonstrate using the default options:

Platform

Example

Linux and Mac OS

ifort -mtune=pentium4 sample.f90

Windows

ifort /G7 sample.f90