This section discusses the three major features of parallel programming supported by the IntelŪ compiler:
Each of these features contributes to application performance depending on the number of processors, target architecture (IA-32 architecture or IA-64 architecture), and the nature of the application. These features of parallel programming can be combined to contribute to application performance.
Parallel programming can be explicit, that is, defined by a programmer using OpenMP directives. Parallel programming can also be implicit, that is, detected automatically by the compiler. Implicit parallelism implements auto-parallelization of outer-most loops and auto-vectorization of innermost loops (or both).
Parallelism defined with OpenMP and auto-parallelization directives is based on thread-level parallelism (TLP). Parallelism defined with auto-vectorization techniques is based on instruction-level parallelism (ILP).
The IntelŪ compiler supports OpenMP and auto-parallelization for IA-32, IntelŪ 64, and IA-64 architectures for multiprocessor systems, dual-core processors systems, and systems with Hyper-Threading Technology (HT Technology) enabled.
To enhance the compilation of the code with auto-vectorization, users can also add vectorizer directives to their program.
A closely related technique, software pipelining (SWP), is available on systems based on IA-64 architecture.
The following table summarizes the different ways in which parallelism can be exploited with the IntelŪ Compiler.
Parallelism |
Description |
---|---|
Implicit (parallelism generated by the compiler and by user-supplied hints) |
|
Auto-parallelization
|
Supported on:
|
Auto-vectorization (Instruction-Level Parallelism) |
Supported on:
|
Explicit (parallelism programmed by the user) |
|
OpenMP* (Thread-Level Parallelism) |
Supported on:
|
For performance analysis of your parallel program, you can use the IntelŪ VTune™ Performance Analyzer and/or the Intel® Threading Tools to show performance information. You can obtain detailed information about which portions of the code that require the largest amount of time to execute and where parallel performance problems are located.