PMFAST : Distributed / Shared Memory 2-Level Particle Mesh N-Body Implementation Copyright (C) 2004 Hugh Merz - merz@cita.utoronto.ca This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ------------------------------------------------------------------------------ Feb. 2nd 2005 (revised for fftw-only version of code) PMFAST : a quick overview. If you would like details as to how the code was designed you may be interested in reading the paper located at the pmfast webpage. PMFAST is a parallel 2-level grid implementation of the particle mesh algorithm. The total size of the grid is determined from the following formula: LPS = ( LF - 2 * LFB ) * NN Where LPS is the 'length of physical space', or the box size in fine mesh cells, LF is the length of each fine grid section, LFB is the length of the fine grid buffer (for included kernels LFB=24), and NN is the number of MPI nodes you are planning on using. In this version of the code (which utilizes the fftw-3.0.1 serial FFT library instead of the IPP library) the restriction on LF has been relaxed to allow for a greater number of possible geometries. Currently the allowable values for LF are restricted by the decomposition of the coarse mesh, LC, which is defined by: LC = LPS / 4 In the first instance LPS must be evenly divisible by the grid ratio, currently fixed at 4. As well as this, the coarse mesh must be evenly divisible by the total number of processors used: mod( LC , NT * NN ) = 0 where NT is the number of threads (processors) per node. This means that LF must satisfy: mod( { [ LF - 2 * LFB ] * NN } / 4 , NT * NN ) = 0 We currently perform simulations using 1 particle for every 8 fine mesh cells, as such the total number of particles used in the simulation, NP, is given by: NP = ( LPS / 2 ) ** 3 We gladly appreciate feedback, bug reports and suggestions for the code. Please send any comments you may have to the maintainers email address located on the pmfast website. We will also do our best to help people set up and execute the code on their computing platform. --------------- * Requirements: F90 compiler FFTW 2.1.5 optimized serial FFT routine (FFTW 3.0.1) Notes on requirements: Our current production version compiles with the Intel Fortran Compiler, which is currently available from intel.com free of charge (non-commercial version). If one would like to employ the shared-memory parallelization of the code the compiler must support OpenMP. FFTW 2.1.5 is required since there is currently no MPI support in version 3 of the library. It should be installed to support single precision and to include mpi transforms, but without threading. For more information or to download: http://www.fftw.org PMFAST uses single precision for all of the major data structures, as such all libraries or external code should be compiled to support this. The production version of the code (also available on the website) currently uses a serial fft routine from the Intel IPP library. Since this library is not freely available and restricts the size of the fine mesh to a power of 2, we have released this version of the code which uses the serial fft from FFTW 3.0.1. We have found it to run at comparable speeds (less than a factor of 2 slower, even for non-power of 2 mesh sizes) and this should make the code much more accessable and portable. Please see http://www.fftw.org to obtain this library. It should be installed to support single precision and without threading support (as it is called by multiple threads simultaneously within PMFAST). --------------- * Instructions: 1) Download and untar (tar -xvzf) code and initial conditions. 2) Edit parameters in header files: iopar.fh [input/output file paths] cosmopar.fh [cosmological parameters] simpar.fh [simulation parameters] The parameters are described within the files. ** Optional program execution modes: * Pairwise Force Testing: If one would like to execute pairwise force testing, the following needs to be set: PAIRPATH in iopar.fh (output location for pair data) pairtest = .true. in simpar.fh NP = 2 in simpar.fh In this mode, 2 particles are placed on the grid and their positions and velocities are written to disk following the fine mesh velocity update (fine_pair.dat) and the coarse mesh velocity update (total_pair.dat). Parameters (a,G,dt) are scaled to 1. Each line of the above files contains: format='12f20.10', x1,y1,z1, vx1,vy1,vz1, x2,y2,z2, vx2,vy2,vz2 One may also wish to edit the set_pair routine in pairs.f90 to modify the scheme used to set the pairs on the grid. A simple analysis program that reads in the above pair data files and calculates the fractional error in the forces is located at utils/pair_check/pair_check.f90 * Generation of Density Projections: input/projections can be edited to include a list of redshift at which density projections are generated. For each redshift indicated three projection files will be created on each node, corresponding to the projection of the overdensity to the midplane of each slab in each orthogonal dimension. These must be recombined following program execution. The projections are written to the location indicated in PROJPATH. The amount of disk space required for each output is: [(4*LC)^2+2*(4*LC)*(4*LC/NN)]*4 bytes utils/combine_projections/combine_proj.f90 is a simple program that reads in the constituent projections (after placing the files within the same directory) and generates full box length projections. utils/combine_projections/topgm.f90 is a file converter that will convert the binary full box length projections to portable greymap (.pgm) files for viewing in a graphics program. If one leaves input/projections empty, no projections will be generated. * Restarting from a checkpoint: In order to restart from a checkpoint, one needs to set INIT_VAL=9 in simpar.fh, as well as selecting the redshift for the checkpoint that one would like to restart from (z_restart). All neccessary information to restart the simulation is contained in OUT1/###.#params.dat, where ###.# is set from (z_restart), as well as the particle list checkpoint files. These are stored at either OUT1/xvp#.dat or OUT2/xvp#.dat, whichever is indicated through the params file. The # in the checkpoint files corresponds to the MPI rank of the node. 3) Edit input/checkpoints This file lists the redshifts at which particle list checkpoints are written to disk. In order for the simulation to run correctly, at least one redshift should exist in this file at which the simulation is to stop. The program will complete upon writing the final checkpoint or attaining the maximum number of timesteps, whichever comes first. Make sure there is sufficient disk space for the checkpoint files at the paths indicated in iopar.fh 4) Distribute initial condtions Each nodes initial conditions should be placed in the directory INICOND and should correspond with the following format: filename ='xvp#.init' [# = rank of MPI process for node] format='binary' contains: integer(4) nploc real(4) xvp(1:6,nploc) If one is using the initial conditions obtained from the website (or from any other serial generator for that matter) then they must be decomposed in the above format, a simple code to do so can be found in utils/decompose_ic/decompose.f90 Note that although particle positions in the x and y dimensions span the entire global fine mesh (0:LPS], the z dimension (along which the simulation is decomposed) spans (0:LPS/NN] for each node. As such the z position of particles needs to be offset to node relative coordinates if one is decomposing a cubical distribution. 5) Edit Makefile and compile pmfast change the path to the fftw library, fortran compiler, and related flags. Make sure that the use of the -openmp flag corresponds to the proper syntax for the compiler that you are using. Depending on how your compiler handles fortran modules, modifications to file dependancies for qsorti.f90 may be required (it works with Intel v8). Compile by running 'make'. 6) Compile filefftw program. This is a background program that performs mpi ffts using a different number of processors than are defined in the main pmfast program. It is located in the filefftw directory and can be compiled using the buildffftw.csh script after modifying the MakeIA32 / MakeIA64 make files to point to the proper f90 compiler, mpi implementation and fftw library. Also be sure to change the fpath variable in filefftw.f90 to point at the same directory as FFTWSWAP in iopar.fh. The buildffftw.csh script accepts 3 arguements, the size of each fine grid section (LF), the number of nodes (NN), and the number of cpus / node (NCPUPN). 7) Start MPI with the total number of nodes + cpus that you would like to use. This should be NCPUPN * NN processes in total. 8) Start the filefftw program using all of the nodes (NCPUPN * NN). Make sure that the filefftw processes are numbered contiguously and not striped across the nodes (this is the default behaviour in LAM - ie - first number_of_cpu/node processes all exist on node 1, etc). 9) Start the pmfast program, using 1 process per node (NN total). In LAM this can be achieved by using 'mpirun n0-7 -np 8 pmfast' with NN=8 for example. 10)Output is placed in the directories indicated in iopar.fh. Should one desire to restart the program this can be done by editing simpar.fh to select the redshift at which the program should be restarted (given that a checkpoint was performed at that redshift). --------- * Output: Output files include particle checkpoints, density projections and a timestep record. A new addition is the mass power-spectrum on the coarse mesh, which is enabled in simpar.fh. All of these can be found in the directories specified in iopar.fh. Particle initial conditions are read in from binary data files, which contain the number of particles (4 byte integer) followed by a sequential list of the particle positions and velocites (x,y,z,vx,vy,vz). Units are in fine grid cells. In fortran 90: read (10) num_particles,xv(1:6,1:num_particles) Particle checkpoints are saved in the same fashion, as well as a parameter file labelled by redshift that includes parameters required to restart the run. If rotate_cp=.true. in simpar.fh particle checkpoints will alternate between two locations, and are overwritten every second checkpoint. Make sure there is enough diskspace for all of your desired checkpoints if you set this flag to be .false. A small program that reads in a checkpoint file and writes a thin slab of particle positions is located at utils/slice_proj/slice_proj.f90, along with a supermongo macro to read in and plot the slice. Density projections are explained in the optional program execution modes section above. ------------------------------ * Example of compiling and executing the code. * in this example we will simply put all of the files in one place on disk and use 2 nodes, with a 128 fine grid mesh 2 cpus / node 1) download initial condtions: (128 - 48) * 2 = 160^3 total mesh size 2) download pmfast tar file 3) unpack pmfast tar file : tar -xvzf pmfast.tar.gz 4) edit iopar.fh: set all file paths to point to your desired location 5) edit cosmopar.fh: set the cosmological parameters to correspond with those used to generate the initial conditions. 6) edit simpar.fh: set: LF = 128 NN = 2 NCPUPN = 2 NT = 2 MAX_PARTICLES = 512000 / 2 * ( 1 + 2 * 24 / 128 ) * (1 + 50 / 100) = 528000 (expecting a particle imbalance of up to 50%) MAX_BUFFER = 528000 * 24 / 128 * 6 = 594000 MAX_TAG = 52800 MAX_NTS = 3000 TS_RATIO_MAX = 3 (we will calculate a maximum of 3 fine timesteps per sweep) DT_SWEEP_SCALE = 1.0 INIT_VAL = 8 And all of the other parameters we will leave as is. 7) edit input/checkpoints and input/projections, enter redshift that you would like checkpointing and density projections to be performed at. 8) decompose intial conditions: edit utils/decompose_ic/decompose.f90: set: nc=160 nn=2 compile: f90 utils/decompose_ic/decompose.f90 -o decompose.x run decompose.x in the same directory as xvp.init, it should produce xvp0.init and xvp1.init. Place these files in the location specified in iopar.fh, with xvp0.init on the first node (rank 0) and xvp1.init on the second node (rank 1) 9) edit makefile. edit library paths, compilers and compiler flags to suit your particular installation environment. Make by executing 'make' 10) compile filefftw cd filefftw run: buildffftw.csh 128 2 2 you may need to edit the makefile (either MakeIA32 or MakeIA64) in the same fashion as the pmfast Makefile. Make sure you run this script in the filefftw directory to avoid clobbering the pmfast makefile. 11) start MPI. With LAM, one can run lamboot lamnodes, where lamnodes is a list of the nodes and how many cpus we are using. ex: lamnodes host1 cpu=2 host2 cpu=2 12) now run the filefftw program using the above two nodes: mpirun -v -np 4 filefftw/filefftw 13) and launch pmfast with a process per node: mpirun -v n0-1 -np 2 pmfast 14) output should appear in the directory specified in iopar.fh. 15) to generate a "thin-slice" of the particle distribution, edit and compile utils/slice_proj/slice_proj.f90. This will create a formatted output file containing all of the particle positions within the slice boundries. This file can be plotted using the included supermongo macro, /utils/slice_proj/slice.sm, or with another plotting utility. 16) to create viewable portable greymap files of the density projections, one can compile and run the file utils/combine_proj/combine_proj.f90 to create full box projections, followed by /utils/combine_proj/topgm.f90 to convert the projections into the .pgm format. combine_proj requires MPI to be linked in, and will have to be edited such that: LPS=160 NN=2 you will want to run the executable `mpirun -v n0-1 -np 2 combine_proj.x` from the location of the checkpoints. Following this you can edit topgm.f90 to select which checkpoint you want to convert, compile it, and view the .pgm in your graphics viewer of choice.