Several functionalities have been added in Madagascar for parallel computing on clusters with distributed memory. The SConstruct files have to be run with pscons instead of scons. The command pscons is a wrapper for the use of SCons with the option -j. The environment variables $RSF_THREADS and $RSF_CLUSTER respectively provide to pscons the number of threads and the address list of the nodes you want to use for your computation.
Options in the SConstruct file
1 - Computing on the local node only by using the option local=1
By default, with pscons, SCons wants to run all the commands of the SConstruct file in parallel. The option local=1 forces SCons to compute locally. It can be very useful in order to prevent serial parts of your python script to be run inefficiently in parallel.
Flow('spike',None,'spike n1=100 n2=300 n3=1000',local=1)
2 - Computing on the nodes of the cluster specified by the environment variable $RSF_CLUSTER
Flow('radon','spike','radon adj=y p0=-4 np=200 dp=0.04',split=[3,1000],reduce='cat')
The option split instructs Flow to split the input file along the third axis of length 1000. If you have several source files and want to split only some of them, say the first and the third one, the option to use will be split=[3,1000,[0,2]].
If we choose $RSF_THREADS=26, we obtain, as an itermediate result in the local directory, the files spike__0.rsf, spike__1.rsf, ..., spike__25.rsf, which are sent and distributed for computation on the different nodes specified by $RSF_CLUSTER. After the parallel computation on the nodes, the resulting files radon__0.rsf, radon__1.rsf, ..., radon__25.rsf, are recombined together to create the output radon.rsf. The parameter reduce selects the type of recombination. Two typical options are reduce='cat' or reduce='add'.
3 - Computing in parallel without using any option
This choice is appropriate when you write a python loop in your program and want it to be run in parallel. This is a way, as well, to speed up sequential parts of your program. However, the user should make judicious decisions as it can have the opposite effect. Indeed, in a serial part of the program, the second command has to wait for the first to finish the run on a different node and to communicate it.
Flow('spike',None,'spike n1=100 n2=300 n3=1000') Flow('radon','spike','radon adj=y p0=-4 np=200 dp=0.04')
Setting the environment variables
In our example, we used 26 threads and send them on 4 nodes, using respectively 6 CPUs on the first node, 4 CPUs on the second, and 8 CPUs on each of the last two nodes.
export RSF_THREADS=26 export RSF_CLUSTER='18.104.22.168 6 22.214.171.124 4 126.96.36.199 8 188.8.131.52 8'
One important setting is to properly manage the temporary files location specified by $TMPDATAPATH and the data storage location specified by $DATAPATH . The temporary files used during the computation have to be stored locally on each node to avoid too much communication between the hard disks and the nodes. The paths will depend on your cluster and you can set them in your .bashrc file, for example:
export DATAPATH=/disk1/data/myname/ export TMPDATAPATH=/tmp/
You are now very close to run Madagascar in parallel ! Once your SConstruct file is ready and your environment variables are set, you can use the following suggested procedure. It has been tested and is currently used on a linux cluster.
- Make sure the disk located at $DATAPATH is mounted on the different nodes.
- Test if there is enough space available on the different nodes of the cluster at the location specified by $TMPDATAPATH. This directory may be filled up, if some jobs have been interrupted. Clean this up if necessary.
- Look at what is going on on your cluster with sftop.
- Everything looks good ? Then go and run pscons instead of scons.
If you need to kill your processes on the cluster, the command sfkill can do it remotely on all the nodes for a specific job command. If you kill your jobs, check it did not filled up the $TMPDATAPATH with temporary files before you run pscons again. One nice feature of running SCons on clusters is fault tolerance (see Sergey's posting, Monday, December 24, 2007).