SCons

From Madagascar
Jump to: navigation, search
Scons-logo-transparent.png

SCons (from Software Construction) is a superior alternative to the classic make utility.

SCons is implemented as a Python script, its "configuration files" (SConstruct files) are also Python scripts. Madagascar uses SCons to compile software, to manage data processing flowing, and to assemble reproducible documents.

Useful SCons options

  • scons -h (help) displays a help message.
  • scons -Q (quiet) suppresses progress messages.
  • scons -n (no exec) outputs the commands required for building the specified target (or the default targets if no target is specified) without actually executing them. It can be used to generate a shell script out of SConstruct script, as follows:
scons -nQ [target] > script.sh

Compilation

SCons was designed primarily for compiling software code. An SConstruct file for compilation may look like

env = Environment()
env.Append(CPPFLAGS=['-Wall','-g'])
env.Program('hello',['hello.c', 'main.c'])

and produce something like

bash$ scons -Q
gcc -o hello.o -c -Wall -g hello.c
gcc -o main.o -c -Wall -g main.c
gcc -o hello hello.o main.o

to compile the hello program from the source files hello.c and main.c.

Madagascar uses SCons to compile its programs from the source. The more frequent usage, however, comes from adopting SCons to manage data processing flows.

Data processing flows with rsf.proj

The rsf.proj module provides SCons rules for Madagascar data processing workflows. An example SConstruct file is shown below and can be found in bei/sg/denmark

from rsf.proj import *
 
Fetch('wz.35.H','wz')
 
Flow('wind','wz.35.H','dd form=native | window n1=400 j1=2 | smooth rect1=3')
Plot('wind','pow pow1=2 | grey')
 
Flow('mute','wind','mutter v0=0.31 half=n')
Plot('mute','pow pow1=2 | grey')
 
Result('denmark','wind mute','SideBySideAniso')
 
End()

Note that SConstruct by itself does not do any job other than setting rules for building different targets. The targets get built when one executes scons on the command line. Running scons produces

bash$ scons
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
retrieve(["wz.35.H"], [])
< wz.35.H /RSF/bin/sfdd form=native | /RSF/bin/sfwindow n1=400 j1=2 | /RSF/bin/sfsmooth rect1=3 > wind.rsf
< wind.rsf /RSF/bin/sfpow pow1=2 | /RSF/bin/sfgrey > wind.vpl
< wind.rsf /RSF/bin/sfmutter v0=0.31 half=n > mute.rsf
< mute.rsf /RSF/bin/sfpow pow1=2 | /RSF/bin/sfgrey > mute.vpl
/RSF/bin/vppen yscale=2 vpstyle=n gridnum=2,1 wind.vpl mute.vpl > Fig/denmark.vpl
scons: done building targets.

Obviously, one could also run similar commands with a shell script. What makes SCons convenient is the way it behaves when we make changes in the input files or in the script. Let us change, for example, the mute velocity parameter in the second Flow command. You can do that with an editor or on the command line as

sed -i s/v0=0.31/v0=0.32/ SConstruct

Now let us run scons again

bash$ scons -Q
< wind.rsf /RSF/bin/sfmutter v0=0.32 half=n > mute.rsf
< mute.rsf /RSF/bin/sfpow pow1=2 | /home/fomels/RSF/bin/sfgrey > mute.vpl
/RSF/bin/vppen yscale=2 vpstyle=n gridnum=2,1 wind.vpl mute.vpl > Fig/denmark.vpl

We can see that scons executes only the parts of the data processing flow that were affected by the change. By keeping track of dependencies, SCons makes it easier to modify existing workflows without the need to rerun everything after each change.

SConstruct commands

Fetch(<file[s]>,<directory>,[options])

defines a rule for downloading data files from the specified directory on an external data server (by default) or from another directory on disk. The optional parameters that control its behavior are summarized below.

Fetch options
Name Default Meaning
private None if the data file is private
server $RSF_DATASERVER or http://www.reproducibility.org remote data server (or local for local files)
top data name of the top data directory on the data server
dir None name of directory after top
usedatapath 1 usedatapath=1 - download to $DATAPATH with symbolic link.

usedatapath=0 - download to pwd

In the example above, Fetch specifies the rule for getting the file wz.35.H: connect to the default data sever and download the file from the data/wz directory.

An example to Fetch with more parameters is:

Fetch('KAHU-3D-PR3177-FM.3D.Final_Migration.sgy',
	  dir='newzealand/Taranaiki_Basin/KAHU-3D',
          server='http://s3.amazonaws.com',
	  top='open.source.geoscience/open_data',
	  usedatapath=1)


Flow(<target[s]>,<source[s]>,<command>,[options])

defines a rule for creating targets from sources by running the specified command through Unix shell. The optional parameters that control its behavior are summarized below.

Flow options
Name Default Meaning
stdout 1 if output to standard out (0 for output to /dev/null, -1 for no output)
stdin 1 if take input from standard in (0 for no input)
rsfflow 1 if using Madagascar commands
suffix '.rsf' default suffix for output files
prefix 'sf' default prefix for programs
src_suffix '.rsf' default suffix for input files
split [] split the flow for data parallel processing
reduce 'cat' how to reduce the output from data parallel processing
local 0 if execute on the local node when using data parallel processing on a cluster

In the example above, there are two Flow commands. The first one involves a Unix pipe in the command definition.

On the use of parallel computing options, see Parallel Computing.

Plot(<target>,[<source[s]>],<command>,[options])

is similar to Flow but generates a graphics file (Vplot file) instead of an RSF file. If the source file is not specified, it is assumed that the name of the output file (without the .vpl suffix) is the same as the name of the input file (without the .rsf suffix).

Plot options
Name Default Meaning
suffix '.vpl' default suffix for the output file
vppen None additional options to pass to vppen
view None if set, show the output on the screen instead of saving it in a file

In the example above, there are two Plot commands.

Result(<target>,[<source[s]>],<command>,[options])

is similar to Plot, only the output graphics file is put not in the current directory but in a separate directory (./Fig by default). The output is intended for inclusion in papers and reports.

Result options
Name Default Meaning
suffix '.vpl' default suffix for the output file

In the example above, Result defines a rule that combines the results of two Plot rules into one plot by arranging them side by side. The rules for combining different figures together (which apply to both Plot and Result commands) include:

  • SideBySideAniso
  • OverUnderAniso
  • SideBySideIso
  • OverUnderIso
  • TwoRows
  • TwoColumns
  • Overlay
  • Movie

End()

takes no arguments and signals the end of data processing rules. It provides the following targets, which operate on all previously specified Result figures:

  • scons view displays the resuts on the screen.
  • scons print sends the results to the printer (specified with PSPRINTER environmental variable).
  • scons lock copies the results to a location inside the DATAPATH tree.
  • scons test compares the previously "locked" results with the current results and aborts with an error in case of mismatch.

The default target is set to be the collection of all Result figures.

Command-line options

Command-line options
Name Meaning
TIMER Whether to time execution
CHECKPAR Whether to check parameters
ENVIRON Additional environment settings
CLUSTER Nodes available on a cluster
MPIRUN mpirun command

Running the example above with TIMER=y produces

bash$ scons -Q TIMER=y
/usr/bin/time < wind.rsf /RSF/bin/sfmutter v0=0.32 half=n > mute.rsf
0.09user 0.03system 0:00.13elapsed 94%CPU (0avgtext+0avgdata 383744maxresident)k
0inputs+0outputs (1513major+0minor)pagefaults 0swaps
/usr/bin/time < mute.rsf /RSF/bin/sfpow pow1=2 | /RSF/bin/sfgrey > mute.vpl
0.10user 0.00system 0:00.18elapsed 59%CPU (0avgtext+0avgdata 384256maxresident)k
0inputs+0outputs (1515major+0minor)pagefaults 0swaps
/usr/bin/time /RSF/bin/vppen yscale=2 vpstyle=n gridnum=2,1 wind.vpl mute.vpl > Fig/denmark.vpl
0.06user 0.03system 0:00.06elapsed 135%CPU (0avgtext+0avgdata 444416maxresident)k
0inputs+0outputs (1739major+0minor)pagefaults 0swaps

In other words, every shell command is preceded by the Unix time utility to measure the CPU time of the process.

Running the example above with CHECKPAR=y, we will not see any difference. Suppose, however, that we made a typo in specifying one of the parameters, for example, by using v1= instead of v0= in the arguments to sfmutter.

bash$ sed -i s/v0=0.31/v1=0.31/ SConstruct
bash$ scons -Q CHECKPAR=y
No parameter "v1" in sfmutter
Failed on "mutter v1=0.31 half=n"

The parameter error gets detected by scons before anything is executed.

Seismic Unix data processing flows with rsf.suproj

If you process data with Seismic Unix instead of Madagascar, you can still take advantage of SCons-based processing flows by using the rsf.suproj module. See book/rsf/su for examples.

Note that, with rsf.suproj, the scons command generates hard copies (eps files) of Result figures, while the scons view command displays figures on the screen using the corresponding X-window plotting commands.

Document creation with rsf.tex

SConstruct commands

  • Paper
  • End([options]) signals the end of book processing rules. It provides the following targets:
    • scons pdf (equivalent to scons paper.pdf)
    • scons wiki (equivalent to scons paper.wiki)
    • scons read (equivalent to scons paper.read)
    • scons print (equivalent to scons paper.print)
    • scons html (equivalent to scons paper.html)
    • scons install (equivalent to scons paper.install)
    • scons figs (equivalent to scons paper.figs)

The default target is set to pdf.

Book and report creation with rsf.book

SConstruct commands

  • Book
  • Papers
  • End([options]) signals the end of book processing rules. It provides the following targets:
    • scons pdf
    • scons read
    • scons print
    • scons html
    • scons www

The default targret is set to be scons pdf.

Errors and Debugging

DBPageNotFoundError

The scons database contains the information required so that scons executes only the parts of the data processing flow that were affected by changes in data, programs, and flows. Sometimes the database become corrupted. For example, when I ran out of disk space my scons stopped leaving a corrupted database. After deleting some files to create enough disk space running scons quickly fails with the message:

scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
scons: *** [31_81_IM.JPG] DBPageNotFoundError : (-30986, 'DB_PAGE_NOTFOUND: Requested page not found')
scons: building terminated because of errors.

I was working in the directory $RSFSRC/book/data/alaska/line31-81. Scons keeps the database in the file $DATAPATH/data/alaska/line31-81/.sconsign.dbhash. It's a little tricky to find this file since it is hidden file in a directory below $DATAPATH. Removing $DATAPATH/data/alaska/line31-81/.sconsign.dbhash fixed this problem.