Is there a way to try random init values repeatedly?


#1

I am trying to initialize a model. I tried to create a simple example below that illustrates the issue. Of course the example is ridiculous and the problem would be solved, if I just specified the proper range for the parameter. My real issue really arises from underflow when I try initial values, for whcih the log-posterior is is such that I end up taking the log of a posterior that is very, very close to 0.

In the example below, the third example for initialization tries a reasonable guess near a good guess and perturbs it. In particular, I am not 100% sure that the guess would lead to a valid log-likelihood, but I would guess that I perturb in that area, I should hopefully (eventually) be fine. But I may get it rejected, but if I tried often enough it should work. Is there a way to try random inits repeatedly?

Another really weird thing is that for my real model (way too complicated to show), I actually get “Initialization between (-2, 2) failed after 100 attempts.”, even though I provided a function for random initialization. I did not manage to reproduce this in the simple example below and am totally puzzled why rstan tries the initialization betwen -2 and 2. I did provide a function that gave an initial value for all parameters defined in the parameters block, should it then not do that, at all??

example_code <- “
functions {
real shiftedbeta_lpdf(real y, real alpha, real beta){
return( beta_lpdf(y-3.0 | alpha, beta) );
}
}
data {
int y;
int n;
}
parameters {
real piplusthree;
}
model {
piplusthree ~ shiftedbeta(0.5, 0.5);
y ~ binomial(n, piplusthree-3);
}

# Try 1: This does not work
stanfit4 <- stan(model_code=example_code, data=list(y=5, n=10))

# Try 2: This of course works fine
stanfit5 <- stan(model_code=example_code, data=list(y=5, n=10), init=list(list(piplusthree=3.5),list(piplusthree=3.5),list(piplusthree=3.5),list(piplusthree=3.5)))

# Try 3: As far as I can tell this just generates a single random proposal that gets rejected
initf1 <- function(){
return( list(piplusthree=rnorm(n=1, mean=3.5,sd=1)) )
}
stanfit6 <- stan(model_code=example_code, data=list(y=5, n=10), init=initf1, seed=123)


#2

The problem is your model. Stan requires every value of the parameters that satisfies the declared constraints to have a finite density. Clearly your shifted beta has a range of only (3, 4), so you want to do this:

real<lower=3, upper=4> piplusthree;

But I have no idea why you’re defining a shifted variable, a shifted distribution, then shifting back in the likelihood.


#3

I realize that in the example I have deliberately provoked this problem. The reason is that I wanted to illustrate the behavior, when an initialization in [-2,2] is not possible (in more realistic cases, because the likelihood is really close to zero).

I was talking about the scenario, when I have some guess, say, 4.1 in this example for initializing the parameter, but perhaps as in the example the exact value of 4.1 won’t work either. In such a scenario, I would want to repeatedly try an initialization with some random perturbation such as sampling from a U(4.1-1, 4.1+1) or N(4.1, 1). And of course I may face that in a multidimensional setting, with each parameter requiring a different initial guess.


#4

We haven’t found likelihood underflow to be an issue—we work on the log scale. The main place we’ve seen it be a problem is when we use a non-stiff ODE solver with a density that is stiff in some of the tail regions, but not stiff in the typical set. In that case, there’s a huge win to be had from initializing near an answer.

It’s not just that you deliberately provoked the problem, it’s that you have an ill-formed Stan model that doesn’t have support over the entire unconstrained parameter space. Stan’s just not designed to work there. It’s not just inits, but it can cause HMC to devolve to random walks even if you start in a region that has support. If your density is badly enough behaved and high enough dimensioned, you’re never going to be able to randomly init in the right places. Just imagine you have this

parameters {
  vector[100] sigma;
...
model {
  for (n in 1:100)
    y[n] ~ normal(0, sigma);

The chance of initializing in a good region is 2^-100, so you won’t have much luck.