Using Parallelism

This section discusses the three major features of parallel programming supported by the Intel® compiler:

Each of these features contributes to application performance depending on the number of processors, target architecture (IA-32 architecture or IA-64 architecture), and the nature of the application. These features of parallel programming can be combined to contribute to application performance.

Parallel programming can be explicit, that is, defined by a programmer using OpenMP directives. Parallel programming can also be implicit, that is, detected automatically by the compiler. Implicit parallelism implements auto-parallelization of outer-most loops and auto-vectorization of innermost loops (or both).

Parallelism defined with OpenMP and auto-parallelization directives is based on thread-level parallelism (TLP). Parallelism defined with auto-vectorization techniques is based on instruction-level parallelism (ILP).

The Intel® compiler supports OpenMP and auto-parallelization for IA-32, Intel® 64, and IA-64 architectures for multiprocessor systems, dual-core processors systems, and systems with Hyper-Threading Technology (HT Technology) enabled.

To enhance the compilation of the code with auto-vectorization, users can also add vectorizer directives to their program.

Note

A closely related technique, software pipelining (SWP), is available on systems based on IA-64 architecture.

The following table summarizes the different ways in which parallelism can be exploited with the Intel® Compiler.

Parallelism	Description
Implicit (parallelism generated by the compiler and by user-supplied hints)
Auto-parallelization (Thread-Level Parallelism)	Supported on: IA-32 architecture, Intel® 64 architecture, IA-64 architecture based multi-processor systems, dual-core, and quad-core processors Hyper-Threading Technology-enabled systems
Auto-vectorization (Instruction-Level Parallelism)	Supported on: Pentium®, Pentium with MMX™ Technology, Pentium II, Pentium III, and Pentium 4 processors
Explicit (parallelism programmed by the user)
OpenMP* (Thread-Level Parallelism)	Supported on: IA-32 architecture, Intel® 64 architecture, IA-64 architecture based multiprocessor systems, and dual-core processors Hyper-Threading Technology-enabled systems

Performance Analysis

For performance analysis of your parallel program, you can use the Intel® VTune™ Performance Analyzer and/or the Intel® Threading Tools to show performance information. You can obtain detailed information about which portions of the code that require the largest amount of time to execute and where parallel performance problems are located.

Using Parallelism

Note

Parallelism

Description

Performance Analysis