From Madagascar
Revision as of 09:11, 13 March 2009 by Mt (Talk | contribs) (Data Visualization (Mentor: Vladimir Bashkardin))

Jump to: navigation, search

Google Summer of Code is a program that offers student developers stipends to write code for various open source projects. Google will be working with several open source, free software, and technology-related groups to identify and fund several projects over a three month period.

Welcome to Madagascar's Google Summer of Code Page

Madagascar, an open source project, is a leading participant in the Open Research movement. As described on Wikipedia, the central theme of open research is to make clear accounts of the methodology, along with data and results extracted therefrom, freely available via the internet. This permits a massively distributed collaboration.

Its design is based on a few simple and powerful principles.

From the coder's point of view, Madagascar is written in C and in Python. The C library is a very loosely coupled set of unix-style filters, transforming stdin to stdout. The Python is mostly an implementation of a custom build system on top of the rule based build system SCons.

Seismic data processing consists of a sequence of steps. Madagascar's filter-based design allows such sequences to be easily composed and abstracted. A key advantage of the Madagascar system is that the computational pipeline is also construed as a build system. Modifications to intermediate steps automatically reinvoke only necessary computations and skip over up-to-date ones, just as a more conventional build system would recompile modules whose code had been touched while reusing modules which are newer than their source. Madagascar extends this model all the way from raw data to publication.

This strategy is a key to reproducibility. By maintaining scripts which contain all transformations from raw data to final publication quality document, Madagascar supports repeatability and testing of scientific computations, thus advancing the collaborative nature of science in the same way that open source advances the collaborative nature of computing.

Directions in which Madagascar is expanding include visualization, parallelization, and user interfaces.

Project Ideas

See also the feature request tracker.

Graphical User Interface (Mentor: Sergey Fomel)

Data Visualization (Mentor: Vladimir Bashkardin)

  • Migrate 2D rendering OpenGL-based code from GSEGYView to Madagascar and create an interactive viewer with zooming/panning features.
  • Migrate 3D rendering GLSL-based code from GSEGYView to Madagascar and create a viewer with the support of pluggable shader programs.
  • Finish 3D rays viewer
  • Create a set of alternatives to sfgraph, sfgrey, sfcontour programs, that would use PLPLOT library instead of VPlot; also, create "pens", that could read from those programs and generate ps, pdf, png output; analyze flexibility of PLPLOT and the possibility to fully mimic VPlot's output (including animation).

Binary Packages (Mentor: Nick Vlad)

  • Generate binary packages to simplify installation on multiple platforms.
  • Given Madagascar's dependencies, and a standardized way of finding other package's dependencies come up with a way/apply a tool to determine the minimum number of packages that make a self-contained Linux distributions that runs Madagascar. Build such a distribution starting from an existing well-supported distribution. Build a virtual appliance from that distribution.

Geophysics / Numerical Analysis (Mentor: Paul Sava)

  • Implement an optimal algorithm for parallel transposes of arrays with 4 or 5 dimensions, up to a few tens of terabytes in volume, on a multi-node Linux cluster
  • As a bonus, FFT one of the transposed dimensions
  • Implement a hardware-adaptive transpose algorithm for a 1-node, SMP machine of 8 nodes or more. Investigate speed of transfers, size of caches, memory arrangement, etc, and make it hardware-adaptive. Bonus for out-of-core capabilities.
  • Implement 3-D seismic data header storage using the fastest open-source database, then compare header I/O times with the classic approach of having a simple table. Which is the fastest way of implementing a large database knowing that the values it will hold are all bools, ints and floats?