Difference between revisions of "Manual"
m (→How to parallelize your programs: wikified 2007-12-27 blog post)
|Line 225:||Line 225:|
===How to parallelize your programs===
===How to parallelize your programs===
[[Parallel Computing]] page.
===Tips and tricks===
===Tips and tricks===
Revision as of 18:19, 10 August 2011
As the number of pages on the Wiki grows, the navbar starts becoming insufficient for proper organization of all documentation. A top-down view of all materials about Madagascar is also useful for determining whether gaps in coverage exist. This page will stay in the Sandbox for a long while -- until all gaps have been filled.
Ideally the manual will only consist of either links to wiki pages, or own content. "Forking", i.e. creating a modified copy of a page especially for the manual, invariably ends up with one version getting out of synch.
- 1 About Madagascar
- 2 Downloading and installing Madagascar
- 3 Using Madagascar
- 3.1 The lightning-quick tour
- 3.2 The Madagascar file formats
- 3.2.1 The Regularly Sampled Format (RSF)
- 3.2.2 Handling irregularly sampled data
- 3.2.3 Importing and exporting data from and to SEG-Y and SU
- 3.2.4 Importing and exporting data from and to raster images
- 3.2.5 Visualizing data and exporting figures with vplot
- 3.2.6 Visualizing data and exporting figures with GLE
- 3.2.7 Visualizing data and exporting figures with gnuplot
- 3.2.8 Visualizing data and exporting figures with PLplot
- 3.3 Calling existing Madagascar programs
- 3.4 What is reproducibility
- 3.5 Exploring existing reproducible papers
- 3.6 Writing a LaTeX paper in the Madagascar framework
- 3.7 Creating a reproducible paper
- 3.8 Data-conditional reproducibility
- 3.9 Creating a reproducible book
- 4 Developing in Madagascar
- 4.1 Writing your own programs
- 4.2 Adding programs to the central repository
- 4.3 Framework development and maintenance
- 4.4 Graphics development with vplot
- 4.5 Packaging madagascar
- 5 Datasets distributed with Madagascar
- 6 Other open-source data analysis packages
For people who do not read manuals
Why use Madagascar?
An articulate description of the reasons on the Why Madagascar page. Have some spectacular pictures obtained with algorithms that are not present in other packages. Describe algorithms/tools unavailable in other open-source geophysical data analysis packages.
A description of the current Madagascar community, with the map of downloads and an estimate of the number of installs, who are the biggest users, outstanding research results obtained with Madagascar, etc. Links to the blog, user mailing list, developer mailing list. Mention of the Google Groups mirrors for rsf-user and rsf-devel Also mention the bug tracker and feature request tracker, encouraging the community to use them more. Mention forums as an alternative for those who want to ask questions or conduct discussions without subscribing to a mailing list.
A history of Madagascar, with the SEPlib/SU part of the "Alternatives" section of the Introduction, and mentions of landmark events (short descriptions where necessary):
- 2003: Work started by Sergey Fomel
- 2004-08 (?): made available to selected alpha users
- 2005-02-16: RSF Blog started
- 2006-03-17: Registered on Sourceforge
- 2006-04-19: Name change from RSF to Madagascar
- 2006-06-11: Public launch at the Open Source E&P Software EAGE Workshop (Vienna)
- 2006-06-18: First stable version (0.9.1). Mailing lists created
- 2006-07-23: Madagascar logo created by Scott Rodgers (BEG)
- 2007-04-27: Release 0.9.4
- 2007-11-10: Release 0.9.5
- 2008-06-14: Release 0.9.6
- 2009-01-03: Release 0.9.7
- 2009-08-01: Release 0.9.8
- 2010-05-02: Release 0.9.9
- 2010-07-23: Release 1.0
- For details on releases after 1.0, see Release Notes
Downloading and installing Madagascar
Licensing and export regulations
- Explanation and text of the GPL 2+
- US export control
The lightning-quick tour
The Madagascar file formats
The Regularly Sampled Format (RSF)
The current Guide to RSF file format
Handling irregularly sampled data
Explain the principle of the current method (sfheadermath/sfheaderwindow used on the trace header block output by su/segyread)
Importing and exporting data from and to SEG-Y and SU
sfsegyread and sfsuread, with examples
Importing and exporting data from and to raster images
Visualizing data and exporting figures with vplot
- Explanation of vplot format
- Preempting display aliasing in raster plots with sfprep4plot
- How to create vplot images with sfgraph, sfgrey, sfdots, etc. Common pen parameters
- How to display vplot images with sfpen. How it defaults to oglpen on systems that support OpenGL, and xtpen on systems that do not support it, but have X Windows.
- How to convert vplot images to other formats with vpconvert. This tool can work on a single plot, i.e.:
<bash> vpconvert file.vpl file.jpg </bash> or an entire collection of files: <bash> vpconvert format=tiff Fig/*.vpl </bash> The vpconvert program can export to and from a multitude of file formats: avi, eps, gif, jpg, mpeg, pdf, png, ppm, ps, svg, and tif, and is the recommended vpl import and export tool. Older single-purpose utilities (vplot2gif and vplot2avi) are still available.
Visualizing data and exporting figures with GLE
Visualizing data and exporting figures with gnuplot
Visualizing data and exporting figures with PLplot
PLplot is a device-independent vector-plotting library. Their concept is very similar to that of vplot, but instead of separated device-dependent pens (like xtpen or pspen) they use loadable "drivers" (organized as shared objects and connected to a plotting programs in a plugin fashion). They have an extensive high-level interface for different types of plots.
A sample Madagascar program which utilizes PLplot's surface rendering capabilities is sfplsurf.
Calling existing Madagascar programs
Finding out what program you need
- sfdoc -k
- Task-centric program list and all its subordinate nodes
- Collection of 2-3 page reproducible papers -- "How to do raytracing in Madagascar"; "How to do modeling in Madagascar"; etc
- SU to m8r dictionary
- SEPlib to m8r dictionary
- Other such dictionaries, for free or proprietary seismic processing packages. Such dictionaries are also useful because they will highlight algorithms/utilities present in such packages but missing from m8r.
This chapter is now just a sketch, should get quite big. Users approach tools in a task-centric fashion, i.e. Q1:"how do I do X with Madagascar?", A1:"With feature Y"; Q2: "How do I use feature Y to this end?" M8r is very good at answering Q2, but people ask Q1 first. Many of the reproducible papers included so far contain cutting-edge research. Users learning how to use Madagascar need to start with something much more simple, where they do not have to focus on understanding research on top of understanding software.
Learning how to use a given program
- Command-line self-doc
- Local html self-doc ($RSFROOT/doc/index.html). Contains all programs installed on the user's machine and only those programs.
- Online self-doc
- The wiki Guide to Programs.
- Series of dedicated reproducible papers that present the theory behind specific geophysical programs and demonstrate it with various types of inputs and combination of parameters, like this paper does for SEPlib's AMO program.
- Combining together multiple programs -- the reproducible papers; pointer to relevant section of the manual ("Exploring reproducible papers")
What is reproducibility
The whole Reproducibility page, combined with Section 1 from Reproducible computational experiments using SCons
Exploring existing reproducible papers
Papers and books included in the Madagascar distribution
Reproducible Documents and more.
How to reproduce specific figures in existing papers
A frequently encountered case is when a researcher wants to reproduce only one or several figures from an entire paper, but not the entire paper. This can happen because on that system LaTeX dependencies of Madagascar are missing or not working properly, or simply because the researcher is interested only in that result.
- Finding the paper directory: If the interesting article has been found by browsing/hyperlink to Reproducible Documents, then the reproducibility package corresponding to
http://www.reproducibility.org/RSF/book/<bookname>/<papername>/paper_html/can be found in
- Finding result names: Use the html version of the paper, or grep in all .tex files in the directory for a text string that occurs in the figure legend. Multiple-panel figures may have individual names for each panel. [Note: In pdf versions obtained with scons pdf in paper directory, neither the book name nor paper directory name nor figure names are given. LaTeX options to have figure names as well as a Geophysics-style header/footer with more details on the first page may be in order]
- Finding where to launch the re-build: In some cases, rules for creating a result are specified in SConstruct files in subdirectories of the main paper directory. If step 4 fails in the main paper directory, then you will have to find where the figure is built. Because result names may be generated automatically, a simple grep may not be enough and you may need to read the SConstruct and python modules imported by it to figure out if the result is generated there.
- Re-build and display the figure by typing scons resultname.view in the appropriate directory.
SConstructs containing a Fetch instruction will attempt to download public-domain input data from a communal server when the "scons" command is run. A fast internet connection is necessary in this case.
How to reproduce entire papers using stored figures
- See the previous section for how to find the paper directory
- Pointer to how to download stored figures (Download#Reproducible figures)
- scons pdf
- scons read
- In case of failure with this kind of messages (details here), you miss TeX system dependencies. Install a TeX system. Tex Live should have it all. Note: It's a 1 Gb download. Too large for many individual users to bother with it and for most IT departments of companies to review for security. We should implement individual dependency checking, like we do in the installation.
- In case of failure with LaTeX Error: File `geophysics.cls' not found you have LaTeX, but you are missing SEGTeX
- If scons pdf in the paper directory requires pdf figures already in place in order to work, run sftour scons lock (?).
How to reproduce entire papers and all their figures
- See an earlier section for how to find the paper directory
- The relevant SCons command (scons lock, or sftour scons lock, as in Download#Reproducible figures to force reproducing the figures in the paper even when the reference figures repository is present
Tell the user to expect conditional reproducibility: If Matlab is not present, rsftex will not try to build the figures but will use the stored PDF files (same goes for Mathematica, xfig, etc.)
How to reproduce whole books
From Jim's 2009-03-7 rsf-devel message: Here is a way to generate all the targets in the subdirectories of $RSFSRC/book/geostats/spatial_stats: <bash> cd $RSFSRC/book/geostats/spatial_stats sftour scons </bash> That works pretty nice. It generates all the targets in each of the 4 subdirectories of book/geostats/spatial_stats.
Now suppose I want to capture the output and errors in a log file (tcsh): <bash> sftour scons >& scons.log </bash> That's nice, except all the output goes into one log file in book/geostats/spatial_stats. Suppose I want 4 separate log files, one in each of the subdirectories. This will do the trick: <bash> sftour 'scons >& scons.log' </bash> The quotes make the entire string go to sftour as the command to run in each directory, instead of just 'scons'.
So far so good. Now suppose I want to go up one directory level and do the process recursively: <bash> cd $RSFSRC/book/geostats sftour sftour 'scons >& scons.log' </bash> Well, that runs scons in each of the book/geostats/*/* directories, but it only makes 3 log files in the 3 subdirectories of book/geostats, not 14 log files in the 14 book/geostats/*/* directories. I can get 14 separate log files like this: <bash> sftour "sftour 'scons >& scons.log'" </bash> That does what I want. Now suppose I want to go up one more level to $RSFSRC/book and run the process recursively three levels deep: <bash> sftour sftour "sftour 'scons >& scons.log'" </bash> This works, but it doesn't put the log files in the bottom level, it puts them one level up. I can't fix it the same way I did before because I've run out of quotes :-) Here is one way to do it (tcsh):
foreach i (*) if ( -d $i) then echo ++++++ $i cd $i sftour "sftour '/usr/bin/time -p scons >& scons.log'" cd .. endif end
From Sergey's 2009-03-10 rsf-devel message: A more elegant solution is <bash> sftour sftour scons >& ../../%/%/scons.log </bash>
Writing a LaTeX paper in the Madagascar framework
Follows the natural progression of learning of somebody who may even not know LaTeX, let alone SCons.
- A paper with no figures.
- A paper with NR-only figures
Creating a reproducible paper
Sections 2 and 3 from Reproducible computational experiments using SCons. Also, mention the "SCons macros" in book/Recipes.
Publications included in Madagascar's book directories are tested periodically. Those publications that fail the tests and are not easily repaired are moved to book/Grave, with a note on how the tests failed and why the problems could not be fixed.
Due to seismic data licensing terms, an author may find it possible to make public everything that is needed to make a publication reproducible, except for the data. In such conditions, the paper is still acceptable for inclusion in the Madagascar collection. To indicate that certain datasets are private, the relevant SConstruct files should use Fetch(...,local=1) or tt>Fetch(...,server=private.server), where private.server is a password-protected private server. Affected vplot figures should be uploaded to the figure repository. Reproducibility testing will be skipped for the affected figures, but the html and pdf versions of the publications will still be created.
The usage of public seismic datasets, such as those at http://software.seg.org/, is strongly encouraged.
Creating a reproducible book
Developing in Madagascar
Writing your own programs
Introduction to the Madagascar API
- The existing data clipping API demo
- A more complex finite differences API demo – add Python, F77 and Matlab APIs to it
How to add your program
How to document your program
How to test your program
How to parallelize your programs
The Parallel Computing page.
Tips and tricks
Madagascar library reference
Adding programs to the central repository
Framework development and maintenance
Description of m8r's inner works for those who want to help improve and maintain Madagascar. Maintenance guide and perhaps other stuff.
Graphics development with vplot
Datasets distributed with Madagascar
- Description of datasets – pictures of the velocity model, of sample gathers, zero-offset sections, migrated image.
- Comment on which are the main problems they illustrate (internal multiples? overturning waves? etc). Algorithm used for generating them, references to published literature describing the datasets
- Command line options for correctly reading them from the storage format (SEG-Y, most probably) into RSF
- In general, expand the datasets section of Reproducible Documents page to include other datasets
Other open-source data analysis packages
Other open-source geophysical packages. Briefly discuss each of them. Mention "dictionaries" from them to m8r where available (should attempt to have dictionaries for all of them)
There are many other useful open-source or public domain software tools in the domain of applied mathematics, that can be used complementary to Madagascar. A few common examples, in alphabetical order, are:
- ALGLIB: a highly portable numerical analysis and data processing library;
- ARPACK ("The ARnoldi PACKage"): a library written in FORTRAN 77 for solving large-scale eigenvalue problems;
- ATLAS ("Automatically Tuned Linear Algebra Software"): a linear algebra library, implementing the BLAS APIs for C and Fortran77;
- Blitz++: a C++ class library for scientific computing which provides performance on par with Fortran 77/90;
- DUNE ("Distributed and Unified Numerics Environment"): a modular toolbox for solving partial differential equations with grid-based methods. It supports the easy implementation of methods like Finite Elements, Finite Volumes, and also Finite Differences.
- FFTW ("The Fastest Fourier Transform in the West"): Hardware-adaptive FFT libraries;
- GNU Scientific Library: a C library for numerical calculations in many branches of applied mathematics and science;
- GNU Triangulated Surface Library: a library providing a set of useful functions to deal with 3D surfaces meshed with interconnected triangles;
- LAPACK ("The Linear Algebra PACKage"): a collection of routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems;
- MINPACK: a library of FORTRAN subroutines for the solving of systems of nonlinear equations, or the least squares minimization of the residual of a set of linear or nonlinear equations;
- PETSc ("the Portable, Extensible Toolkit for Scientific computation"): A set of serial and parallel, linear and nonlinear, solvers for large-scale, sparse linear and nonlinear systems of equations;
- SciPY ("Scientific Python"): a toolbox for scientific computing in Python;
- uBLAS: a C++ template class library that provides BLAS level 1, 2, 3 functionality for dense, packed and sparse matrices.
Most Wikipedia pages of the above libraries are valuable resources, as is Wikipedia's list of numerical analysis software. Other libraries and standalone programs are available through the Netlib repository and indexed by the Guide to Available Mathematical Software. The U.S. DOE's ACTS Collection is another valuable repository.