Good link to suggest for non-centering parametrization?

Kind of related to the topic, what do you link people when you see they have a model parameter that could use a non-centering?

I link them: Diagnosing Biased Inference with Divergences

But really I want to just show them the difference in:

parameters {
  real x;
  real mu;
  real<lower = 0.0> sigma;
}
model {
  ...
  x ~ normal(mu, sigma);
}

And:

parameters {
  real x_z;
  real mu;
  real<lower = 0.0> sigma;
}
transformed parameters {
  real x = mu + sigma * x_z;
}
model {
  ...
  x_z ~ normal(0.0, 1.0);
}

And then I kinda feel like I should also explain the offset-multiplier:

parameters {
  real mu;
  real<lower = 0.0> sigma;
  real<offset = mu, multiplier = sigma> x;
}
model {
  ...
  x ~ normal(mu, sigma);
}

That third model seems like a bridge too far for one post, but this is the page I know that explains offset-multiplier well: Updated non-centered parametrization with offset and multiplier . Doesn’t seem great to link since it’s sort of a bug report.

It would be nice to say something like:

“Look into giving parameter X a non-centered parameterization and see if that helps. What non-centered means is this (link to non-centering explanations). Why I’m saying non-centered is your model looks like the 8-schools thing (link to divergences case study). The easiest way to try this out is the offset-multiplier (link to tagged bit of non-centering page)”

Maybe these links are available these days and I’m just outdated, so I am curious how you all answer these questions. Hopefully this isn’t too off topic :D

2 Likes

I think this is a good question, but took the liberty to move into a new topic, as it was IMHO a bit off-topic in the original :-D

This looks like a quite good link to me, but agree there might better. If anyone would volunteer to write a quick #howto post on the topic that we could link to from anywhere, it would be for the best :-).

Cool beans. I am also curious if I am just out of date. I feel like there might be some simpler thing to link to for offset-multiplier that I am just not aware of, for instance.

And note this trick to easily toggle between the two parameterizations:

2 Likes

What to offset / multiplier even do? Make it so that (0,1) in the “unconstrained” variables map to (mu, mu+multiplier), that’s all?

@Funko_Unko yep!

FWIW, I think it would be really useful to put together some kind of simple treatment of hierarchical funnels and non-centering, but crucially it would be great if this treatment could give an intuitive explanation for when and why centering can be preferable (imo it’s intrinsically subtler to grasp the funnel-shaped likelihood than the funnel-shaped prior).

Also, maybe I’m the only one who’s ever run into the sticky boundary issue in an applied setting, but I’d be reluctant to recommend offset/multiplier to people who are first encountering non-centering until it’s solved.

2 Likes

No need to do <lower=0.0>. Can just do <lower=0>, right? Or am I missing something?

Yeah it’s a change of variables in the same sense as here with f^{-1}(y) = \text{offset} + \text{multipler} * y.

Ooof, good point

Yup, just habitually nervous about integers vs. doubles.

4 Likes

And on a different but related note, is there anything else magical about the unit scale, except that default metric, timestep and initial points are such that they make sense on that scale?

I think that’s it. It’s computationally motivated and I don’t know of an advantage to another one.

1 Like

Okay, and finally:

The three above ways to write the same model should perform identically, if initialized with points sampled from the prior and a metric computed from the prior? (and a timestep appropriate for the prior)

@Funko_Unko The three code snippets in @bbbales2’s OP all encode the same model, but performance is often wildly different between the first model and the second two. The first model is the centered parameterization; the second two are different implementations of the non-centered parameterization (that are not strictly equivalent under-the-hood). In may cases the centered parameterization leads to a funnel degeneracy in the posterior and fails to return reliable samples (this happens when sigma is weakly informed and at least some values of x are weakly informed). In other cases, the non-centered parameterization leads to a funnel degeneracy and fails to return reliable samples (this happens when sigma is weakly informed and at least some of the values of x are strongly informed). As an aside, the “centering” isn’t the important thing here, what’s important is the “scaling” by sigma. Thus, the terms “centered” and “non-centered” don’t actually capture the important difference between the parameterizations.

The second two models are both the non-centered parameterization, but the computations under the hood are not identical, and so they do not perform strictly identically. They can show slightly different runtimes (which can potentially become noticeable if x is a very long vector and the rest of the model is super-cheap) and can potentially encounter different numerical issues. They’ll also result in saved objects of different sizes (one saves both x and x_z) which can be important if x is really long (but this issue can be eliminated if desired by moving the computation of x to the model block). I’m unsure whether (absent weird numerical differences) we would expect the same stochastic exploration under the same fixed seed for the second and third.

4 Likes

Thank you!

I’m actually just stupid, I thought “Huh, an affine transformation, this cannot affect anything”, but really the Jacobian of the transformation is not constant, as the coefficients of the transformation are parameters themselves…

That was silly of me, thanks for the patience!

2 Likes

Ha, I can’t believe I hadn’t pick up on that misnomer; wonder if we can get the community to transition to something like scaled/non(un?)-scaled. Though I almost want the version with multiplier=sigma to be the one called “scaled”.

2 Likes