Initialization using prior

Hi,

Would it not be useful to have an option to initialize parameters by drawing from the prior? According to this answer

the reason for initializing uniformly on [-2,2] instead, is to avoid numerical errors if a user specifies a flat prior. But sometimes I would like to specify a tight prior around let’s say theta0, in which case the drawn initial value can have a really small prior probability and extra work is needed to get close to theta0. Of course I could draw or set the initial values myself and then give as input to Stan, but having an option to do it in Stan would make things easier.

1 Like

@jonah didn’t you have something to say on this at some point? @andrewgelman has asked for it before, I think.

The default initialization can also be a cause of numerical problems, for example with ODE models where a solution explodes to infinity or nan with the initial parameter combination. Sometimes these values could be ruled out by using a sensible prior, but as it does not affect the initialization, the starting values can be in the numerically unstable region, which possibly leads to a really slow warmup.

But I am not sure if the Stan program knows or makes distinction between which target += ... or ~ statements are defining a prior so I don’t know what implementing the prior initialization would require.

That’s the reason. @jonah did some experiments for rstanarm, where rstanarm knows the priors, but in a specific example that didn’t help much and the default priors are often too wide. If you are asking this for lgpr package then you could initialize based on the known priors.

This usually happens quite fast (although we have at least one counter example). If you have an example where lot of extra works is required send it to @bbbales2 who is working on improved adaptation.

I’ve seen several examples where [-2,2] is too wide and cause numerical errors, and sometimes several retries (done automatically by Stan) works and sometimes initialization or parameterization need to be changed.

2 Likes

It’s just to provide a sound way of providing diffuse initialization to parameters that are roughly unit-scale when unconstrained.

A better strategy computationally, if not aesthetically, is to rescale the paramter so it’s unit scale. That is,

parameters {
  real a_std:
...
transformed parameters {
  real a = 1e-20 * a_std;
...
model {
  a_std ~ normal(0, 1);

will work better in Stan than

parameters {
  real a;
...
model {
  a ~ normal(0, 1e-20);

Though with a recent enough version of Stan, you can do this to have the best of both worlds:

parameters {
  real<multiplier = 1e-20> a;
...
model {
  a ~ normal(0, 1e-20);

and now the uniform(-2, 2) init works as does adaptation.

Not being unit scale isn’t only a problem for inits, it’s also a problem for adaptation which initializes with a unit variance assumption.

3 Likes