I’ve been thinking that we could do continuous adaptation

of a diagonal mass matrix at not much more effort than

we put in now (one divide per parameter per iteration).

We can use a decaying version of Welford’s algorithm to

keep an on-the-fly (co-)variance estimate. We could

even set the decay function up to be a smooth version of

what we have now — putting most of the weight on the

last half of the transitions. I’m pretty sure we could

set it up so that we get a proper regularization term,

or the regularization can come in just at the initialization

in the form of data, such as 20 observations of diag(1).

The problem with a dense mass matrix is the inversion of

the estimated covariance matrix — that’s O(N^3) where N

is the number of parameters. For a diagonal matrix it’s

O(N), with a single divide and assign operation per parameter.

If stepsize adaptation is relatively fast, we could then

do it more often.

I’d think this would be more stable than doing it by blocks.

- Bob