But really I want to just show them the difference in:

parameters {
real x;
real mu;
real<lower = 0.0> sigma;
}
model {
...
x ~ normal(mu, sigma);
}

And:

parameters {
real x_z;
real mu;
real<lower = 0.0> sigma;
}
transformed parameters {
real x = mu + sigma * x_z;
}
model {
...
x_z ~ normal(0.0, 1.0);
}

And then I kinda feel like I should also explain the offset-multiplier:

parameters {
real mu;
real<lower = 0.0> sigma;
real<offset = mu, multiplier = sigma> x;
}
model {
...
x ~ normal(mu, sigma);
}

That third model seems like a bridge too far for one post, but this is the page I know that explains offset-multiplier well: Updated non-centered parametrization with offset and multiplier . Doesnâ€™t seem great to link since itâ€™s sort of a bug report.

It would be nice to say something like:

â€śLook into giving parameter X a non-centered parameterization and see if that helps. What non-centered means is this (link to non-centering explanations). Why Iâ€™m saying non-centered is your model looks like the 8-schools thing (link to divergences case study). The easiest way to try this out is the offset-multiplier (link to tagged bit of non-centering page)â€ť

Maybe these links are available these days and Iâ€™m just outdated, so I am curious how you all answer these questions. Hopefully this isnâ€™t too off topic :D

I think this is a good question, but took the liberty to move into a new topic, as it was IMHO a bit off-topic in the original :-D

This looks like a quite good link to me, but agree there might better. If anyone would volunteer to write a quick #howto post on the topic that we could link to from anywhere, it would be for the best :-).

Cool beans. I am also curious if I am just out of date. I feel like there might be some simpler thing to link to for offset-multiplier that I am just not aware of, for instance.

FWIW, I think it would be really useful to put together some kind of simple treatment of hierarchical funnels and non-centering, but crucially it would be great if this treatment could give an intuitive explanation for when and why centering can be preferable (imo itâ€™s intrinsically subtler to grasp the funnel-shaped likelihood than the funnel-shaped prior).

Also, maybe Iâ€™m the only one whoâ€™s ever run into the sticky boundary issue in an applied setting, but Iâ€™d be reluctant to recommend offset/multiplier to people who are first encountering non-centering until itâ€™s solved.

And on a different but related note, is there anything else magical about the unit scale, except that default metric, timestep and initial points are such that they make sense on that scale?

The three above ways to write the same model should perform identically, if initialized with points sampled from the prior and a metric computed from the prior? (and a timestep appropriate for the prior)

@Funko_Unko The three code snippets in @bbbales2â€™s OP all encode the same model, but performance is often wildly different between the first model and the second two. The first model is the centered parameterization; the second two are different implementations of the non-centered parameterization (that are not strictly equivalent under-the-hood). In may cases the centered parameterization leads to a funnel degeneracy in the posterior and fails to return reliable samples (this happens when sigma is weakly informed and at least some values of x are weakly informed). In other cases, the non-centered parameterization leads to a funnel degeneracy and fails to return reliable samples (this happens when sigma is weakly informed and at least some of the values of x are strongly informed). As an aside, the â€ścenteringâ€ť isnâ€™t the important thing here, whatâ€™s important is the â€śscalingâ€ť by sigma. Thus, the terms â€ścenteredâ€ť and â€śnon-centeredâ€ť donâ€™t actually capture the important difference between the parameterizations.

The second two models are both the non-centered parameterization, but the computations under the hood are not identical, and so they do not perform strictly identically. They can show slightly different runtimes (which can potentially become noticeable if x is a very long vector and the rest of the model is super-cheap) and can potentially encounter different numerical issues. Theyâ€™ll also result in saved objects of different sizes (one saves both x and x_z) which can be important if x is really long (but this issue can be eliminated if desired by moving the computation of x to the model block). Iâ€™m unsure whether (absent weird numerical differences) we would expect the same stochastic exploration under the same fixed seed for the second and third.

Iâ€™m actually just stupid, I thought â€śHuh, an affine transformation, this cannot affect anythingâ€ť, but really the Jacobian of the transformation is not constant, as the coefficients of the transformation are parameters themselvesâ€¦

Ha, I canâ€™t believe I hadnâ€™t pick up on that misnomer; wonder if we can get the community to transition to something like scaled/non(un?)-scaled. Though I almost want the version with multiplier=sigma to be the one called â€śscaledâ€ť.