Denormalized floating-point values are those that are too small to be represented in the normal manner; for example, the mantissa cannot be left-justified. Denormal values require hardware or operating system interventions to handle the computation, so floating-point computations that result in denormal values may have an adverse impact on performance.
There are several ways to handle denormals to increase the performance of your application:
Scale the values into the normalized range
Use a higher precision data type with a larger dynamic range
Flush denormals to zero
For example, you can translate them to normalized numbers by multiplying them using a large scalar number, doing the remaining computations in the normal space, then scaling back down to the denormal range. Consider using this method when the small denormal values benefit the program design.
Consider using a higher precision data type with a larger dynamic range. For example, converting variables declared as float to be declared as double. Understand that making the change has the potential to cause your program to slow down, storage requirements will increase, which increases the amount of time loading and storing data from memory; it can also decrease the potential throughput of SSE operations.
If you change the declaration of a variable you might also need to change the libraries you call to use the variable; for example, cosd() instead of cos(). Another strategy that might result in increased performance is to increase the amount of precision of intermediate values using the –fp-model [double|extended] option; however, if you increased precision as the solution to slow performance caused by denormal numbers you must verify the resulting changes actually increase performance.
Finally, In many cases denormal numbers be treated safely as zero without adverse effects on program results. Depending on the target architecture, use flush-to-zero (FTZ) options.
These architectures take advantage of the FTZ and DAZ (denormals-are-zero) capabilities of Streaming SIMD Extensions (SSE), Streaming SIMD Extensions 2 (SSE2), and Streaming SIMD Extensions 3 (SSE3), and Supplemental Streaming SIMD Extensions 3 (SSSE3) instructions.
By default, the compiler for the IA-32 architecture generates code that will run on machines that do not support SSE instructions. The compiler implements floating-point calculations using the x87 floating-point unit, which does not benefit from the FTZ and DAZ settings. You can use the -x (Linux* and Mac OS*) or /Qx (Windows*) option to enable the compiler to implement floating-point calculations using the SSE and SSE2 instructions. The compiler for the Intel® 64 architecture generates SSE2 instructions by default.
The FTZ and DAZ modes are enabled by default when you compile the source file containing main() using the Intel Compiler. The compiler generates a call to a library routine that performs a runtime processor check. The FTZ and DAZ modes are set provided that the modes are available for the machine on which the program is running.
Enable the FTZ mode by using the -ftz (Linux and Mac OS) or /Qftz (Windows) option on the source file containing main(). The -O3 (Linux and Mac OS) or /O3 (Windows) option automatically enables -ftz or /Qftz.
After using flush-to-zero, ensure that your program still gives correct results when treating denormalized values as zero.
See Also
Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 1: Basic Architecture