Cubic decomposition MPI FFTW 3d-fft Fortran wrapper
This is a Fortran wrapper that can be used to perform 3D FFT's with the MPI version of FFTW (2.1.5), using a cubically decomposed data set.
The native data layout in the MPI version of FFTW is only decomposed along 1-dimension (slab decomposition) and as such it is not well suited for programs that are decomposed along 3 dimensions. This example code shows how one can redistribute cubically decomposed data to slabs, which can then be transformed by the FFTW library.
Older version that supports OpenMP and seperate processes to calculate the FFT:
Further to this one may desire to use OpenMP within their code and since the MPI FFTW routines are not threaded, transforms would only be performed using one process per node.
This wrapper extends the functionality of the FFTW transform by including intermediary communication routines that convert cubic to slab data decomposition layouts and vice-versa, following the data layout conventions specified in the Compaq Math Library (cxml). It also executes separate fftw processes within the lam multicomputer that communicate with the main program, allowing for a full processor load during transforms.
The wrapper is currently written in Fortran.
The wrapper is currently written to support only single precision, however this can be easily extended to support double precision as well.
Due to memory management issues with the Intel compilers, this wrapper uses statically allocated memory that must be specified at compilation. This should be extended to use dynamic memory allocation to allow for different size transforms.
Mpi communicators are specified within the wrapper and may require modifications to the source code, as all the processes (both fftw and the original code) are part of mpi_comm_world. A communicator for the original program is provided in the wrapper, one should only have to replace mpi_comm_world in their source with mpi_comm_cube to utilise the wrapper.
The number of nodes used in the main program must be cubic, ie: i^3 nodes, i E I, and the decomposition must be symmetrically cubic, ie: n^3 elements in the array.
Currently the wrapper only allows for one size of array to be transformed, it should be extended to support multiple array sizes (definately easier to implement with dynamic memory allocation). It is also suggested that the number of nodes evenly divide into the size of the array, as well as the number of nodes in each dimension of the cubic decomposition.
Email me if you would like a sample data-set to test with.
Please see the
README file included in the archive.