next up previous [pdf]

Next: The preconditioned solver Up: PRECONDITIONING THE REGULARIZATION Previous: Importance of scaling

You better make your residuals IID!

In the statistical literature is a concept that repeatedly arises, the idea that some statistical variables are IID, namely Independent, Identically Distributed. In practice we'll see many random looking variables, some much closer than others to IID. Theoretically, the ID part of IID means the random variables come from identical probability functions. In practice, the ID part mostly means the variables have the same variance. The ``I'' before the ID means that the variables are statistically independent of one another. In the subject area of this book, signals, images, and earth volumes, the ``I'' before the ID means that our residual spaces are white - have all frequencies present in roughly equal amounts. In other words the ``I'' means the statistical variables have no significant correlation in time or space. IID random variables have fairly uniform variance in both physical space and in Fourier space.

IID random variables have uniform variance in both physical space and Fourier space.

In a geophysical project it is important the residual between observed data and theoretical data is not far from IID. We fit the data difference by minimizing the sum of the squared residuals, so if any collection of residuals is small, their squares are really small, so such regression equations are effectively ignored. We would hardly ever want that. Consider reflection seismograms. They get weak at late time. So even with a bad fit the difference between real and theoretical seismograms is necessarly weak at late times. We don't want the data at late times to be ignored. So we boost up the residual there. We choose $ \bold W$ to be a diagonal matrix that boosts late times in the regression $ \bold 0 \approx \bold r = \bold W(\bold F\bold m-\bold d)$

An example with too much low (spatial) frequency in a residual might arise in a topographic study. It is not unusual for the topographic wavelength to exceed the survey size. Here we should choose $ \bold W$ to be a filter to boost up the higher frequencies. Perhaps $ \bold W$ should contain a derivative or a Laplacian. If you set up and solve a data modeling problem and then find $ \bold r$ is not IID, you should consider changing your $ \bold W$ .

Now let us include regularization $ \bold 0 \approx \bold A \bold m$ and a preconditioning variable $ \bold p$ . We have our data fitting goal and our model styling goal, the first with a residual $ \bold r_d$ in data space, the second with a residual $ \bold r_m$ in model space. We have had to choose a regularization operator $ \bold A$ and a scaling factor $ \epsilon$ .

0 $\displaystyle \approx$ $\displaystyle \bold r_d \ = \ \bold F \bold A^{-1}\bold p -\bold d$ (10)
0 $\displaystyle \approx$ $\displaystyle \bold r_m \ = \ \epsilon \bold p$ (11)

This system of two regressions could be packed into one; the two residual vectors stacked on top of each other, likewise the operators $ \bold F$ and $ \epsilon \bold I$ . The IID notion seems to apply to this unified system. That gives us a clue how we should have chosen the regularization operator $ \bold A$ . Not only should $ \bold r_d$ be IID, but also should $ \bold r_m$ . But within a scale $ \epsilon$ , $ \bold r_m=\bold p$ . Thus the preconditioning variable is not simply something to speed computational convergence. It is a variable that should be IID. If it is not coming out that way, we should consider changing $ \bold A$ . Chapter [*] addresses the task of chosing an $ \bold A$ so that $ \bold r_m$ comes out IID.

We should choose a weighting function (and/or operator) $ \bold W$ so data residuals are IID. We should also choose our regularization operator $ \bold A$ so the precondioning variable $ \bold p$ comes out IID.

Finally, the $ \epsilon$ . How should we choose this number? Let $ \mathcal{E}(\vert r_d\vert)$ be read as the Expected average value of $ \vert r_d\vert$ . The concept that each component in the vector $ \bold r = (\bold r_d,\bold r_m)\T$ should have the same expected absolute value leads to the notion that the value of epsilon should be $ \epsilon=\mathcal{E}(\vert r_d\vert)/\mathcal{E}(\vert p\vert)$ . (I vaguely remember trying this once and discovering that epsilon could not be bootstrapped. It either diverged to infinity or converged to zero depending on its starting value. Perhaps the epsilon we should use is the starting value poised between each divergence! I don't trust my memory for an important issue like this. Somebody else should try this again.)

There is another strange idea here which is a consequence of the notion that elements in $ \bold r = (\bold r_d,\bold r_m)\T$ should be IID. It means elements of $ \bold r_d$ and $ \bold r_m$ should not be correlated. ``But wait'', you say, ``it makes no sense to correlate spaces of different dimension.'' That is where the formal statistical notion of ``ensemble'' arises. If there are many worlds, and if we may speak of an average over worlds, then we can have an array of averages, the array being of dimension the number of components of $ r_d$ by that of $ r_m$ each an average over the many "worlds". How can a practioner absorb this notion? Perhaps in some cases the model and data space have a natural alignment such that a product of the two spaces can be locally averaged. Post stack migration suggests the example that a hyperbola in data space hitting its top in model space. This suggests we use a $ 2\times 2$ weight matrix

$\displaystyle \left[ \begin{array}{cc} W_{dd} & W_{dm} \\ W_{dm} & W_{mm} \end{array} \right]$ (12)

The closest I have come to seeing something like this in practice is in Chapter [*] where two parts of model space, the water bottom and the water top came out correlated. They should not have been so, because their fluctuations had different mechanisms.


next up previous [pdf]

Next: The preconditioned solver Up: PRECONDITIONING THE REGULARIZATION Previous: Importance of scaling

2011-08-20