Difference between revisions of "RSF Comprehensive Description"

From Madagascar
Jump to navigation Jump to search
Line 8: Line 8:
  
 
==Encoded information==
 
==Encoded information==
 +
A RSF dataset consists of the sequence of numerical values in the array and of information about this "out-of-core array" (metainformation -- data about data). The distinct pieces of metainformation will be assigned names in italics, to help define the format later.
  
 +
To the extent possible, each program that acted on the array will record:
 +
 +
* Name of program (''prog'')
 +
* Directory in which the program was run (''dir'')
 +
* User that ran the program (''user'')
 +
* Short hostname of machine on which the program was run (''host''). For example, this would be <tt>machine</tt> instead <tt>machine.university.edu</tt>
 +
* Date and time (up to seconds) at which the program was started (''datetime'')
 +
* A pointer to the binary data (''pointer'')
 +
* Data type, i.e. integer, real or complex (''type'')
 +
* Data encoding, i.e. name of protocol for representing the data (''format'')
 +
* Size of each data sample in bytes (''esize'')
 +
* For each dimension # in the dataset (1 <= # <= 9), specify:
 +
** Number of elements in that dimension: n#, i.e.: n1, n2, n3... (''nelem_axis#'')
 +
** Origin on that axis: o#, i.e.: o1, o2, o3... (''orig_axis#'')
 +
** Sampling interval on that axis: d#, i.e.: d1, d2, d3... (''sampl_axis#'')
 +
** Label for that axis: label#, i.e.: label1, label2, label3... (''label_axis#'')
 +
** Physical unit for that axis: unit#, i.e.: unit1, unit2, unit3... (''unit_axis#'')
  
 
==Encoding protocol==
 
==Encoding protocol==

Revision as of 09:10, 22 December 2009

Introduction

The Regularly Sampled Format (RSF) is a specific arrangement of information defined by the behavior of the implementation of the C API of Madagascar on a reference machine architecture with a reference version of a Linux distribution and associated dependencies. RSF is the way in which most Madagascar programs expect their input to be and in which they structure their output. Due to portability of code, an exact reference setup has not been specified, and it is expected that Madagascar programs will read and write in the same way on any machines on which the package was compiled successfully. While a intuitive introduction to RSF already exists, this document attempts to describe RSF in an exhaustive fashion, for the use of programmers attempting to interface Madagascar with other packages.

The RSF originated as a representation of a discrete set of values of a single-valued function defined on a n-dimensional space. A real-world example of this would be the values of pressure in a space-time volume through which an acoustic wavefield propagates. Each dimension of the space is either discrete and regular, or continuous, but sampled discretely and regularly. In this context, regularity is defined as the property of a set of reals or integers of being consecutive integer multiples of a finite quantity of the same kind of themselves. RSF can be visualized as "matrices with physical dimensions".

The physicality of dimensions, while useful for jargon, for explanations and for the definition of parameters below, is not compulsory. Ultimately RSF is just a sane way of storing n-d arrays on disk. Just like programming languages use intrinsic methods and user-defined procedures to work with arrays held in memory, Madagascar uses both programs in its main distribution and user-written programs to work with data stored in RSF. RSF datasets are just out-of-core arrays.

Encoded information

A RSF dataset consists of the sequence of numerical values in the array and of information about this "out-of-core array" (metainformation -- data about data). The distinct pieces of metainformation will be assigned names in italics, to help define the format later.

To the extent possible, each program that acted on the array will record:

  • Name of program (prog)
  • Directory in which the program was run (dir)
  • User that ran the program (user)
  • Short hostname of machine on which the program was run (host). For example, this would be machine instead machine.university.edu
  • Date and time (up to seconds) at which the program was started (datetime)
  • A pointer to the binary data (pointer)
  • Data type, i.e. integer, real or complex (type)
  • Data encoding, i.e. name of protocol for representing the data (format)
  • Size of each data sample in bytes (esize)
  • For each dimension # in the dataset (1 <= # <= 9), specify:
    • Number of elements in that dimension: n#, i.e.: n1, n2, n3... (nelem_axis#)
    • Origin on that axis: o#, i.e.: o1, o2, o3... (orig_axis#)
    • Sampling interval on that axis: d#, i.e.: d1, d2, d3... (sampl_axis#)
    • Label for that axis: label#, i.e.: label1, label2, label3... (label_axis#)
    • Physical unit for that axis: unit#, i.e.: unit1, unit2, unit3... (unit_axis#)

Encoding protocol

RSF "files" actually consist of a header file and a data file.

This file attempts to document the actual interface implemented in file.c; it is not a formal specification, and in the event of a disagreement the code should be taken as the official reference.

Header Files

Associated with each such data file is a header file. The header file is 7 bit ASCII (UTF-7).

Lines with no "=" are considered comments and are ignored. Lines with more than one "=" are illegal.

Lines with a single "=", with no adjacent spaces, assign a value to an alphanumeric named variable

Textstrings must be delimited by pairs of quotes. Numerical values are subject to C's parsing rules.

"in=" parameter contains the fully qualified path to the relevant data and is required

"n#" (n1, n2, n3) etc. is the number of points in a dimension. n1 is the fastest direction (maps directly onto memory).

n1 must be specified. The size of the array is the product of all n# values with the size of the fundamental type

Optional elements

In addition to the above, many filters enforce the following conventions:

"d#" is the physical spacing in the respective dimension

"o#" is the physical origin in some absolute coordinate system of the respective dimension

Data Files

Data files are rectangular arrays of data. The following data formats are supported

  • ASCII (C-compatible input)
  • XDR (device independent binary standard)
  • native binary (default)

ASCII format is most useful for debugging, while XDR format is useful for portability across formats.

Performance is optimized for native format.

The following data formats are supported:

  • unsigned byte
  • byte
  • int (native int)
  • short (2 bytes)
  • float (native float)
  • complex (real, imaginary float pairs)

RSF files in streams

Input

When a Madagascar program writes a RSF file to disk (i.e.: sfprog >file.rsf), it will create a header file and a binary file as described above.

If the output is to a stream, or if the parameter --out=stdout is passed to the program, then the program will write to the stdout stream the ASCII header, followed by the sequence of three special characters: EOL EOL EOT (\014\014\004), followed by the binary.

Output

When a Madagascar program reads from the stdin stream, it expects either a EOF character indicating the end of the ASCII header (after which it transfers the stdin to reading from the binary cube), or a EOL EOL EOT sequence indicating that the data follows immediately on stdin.