Rejecting initial value: Chain 1: Log probability evaluates to log(0), i.e. negative infinity

Hi all,

I am receiving the same error that I have noticed others on this list have received. I played around with different priors and still get this message. I just can’t see what might be wrong. The code is below this message. In this code, I have the lower bound of the sigmas a small positive number, but I get the same error message when I set the lower bound to zero.

Thanks

David

Chain 1: Log probability evaluates to log(0), i.e. negative infinity.
Chain 1: Stan can’t start sampling from this initial value.
Chain 1:
Chain 1: Initialization between (-2, 2) failed after 100 attempts.
Chain 1: Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] “Error in sampler$call_sampler(args_list[[i]]) : Initialization failed.”
error occurred during calling the sampler; sampling not done

Code:

Begin Stan Code. Notice that Stan uses // for comments

modelString = "
data {

  int<lower=1> n; // number of students
  int<lower=1> G; // number of schools
  int schid[n]; // school indices
  vector[n] ASRREA01; // reading outcome variable
  vector[n] ASBG04;   // number of books in the home
  vector[n] ACBG03A;  //percent of students in school g from economically disadvantaged                                    // homes

}

parameters {
  vector[G] alpha;
  vector[G] beta1;
  real gamma00;
  real gamma01;
  real mu_beta1;
  real<lower=0.1> sigma_alpha;
  real<lower=0.1> sigma_beta1;
  real<lower=0.1> sigma_read;
}

model{

   gamma00 ~ normal(400,10);       // Informative
   gamma01 ~ normal(0, 1);        // Informative
   mu_beta1 ~ normal(0, 1);       // Non-informative
   sigma_read ~  uniform(10,50);  // Weakly-informative
   sigma_alpha ~ uniform(5,20);   // Weakly-informative
   sigma_beta1 ~ uniform(1,4);    // Weakly-informative
  
  for(i in 1:n) {
    ASRREA01[i] ~ normal(alpha[schid[i]] + beta1[schid[i]] * ASBG04[i], sigma_read);
  }
  
  for(g in 1:G) {
    alpha[g] ~ normal(gamma00 + gamma01 * ACBG03A[g], sigma_alpha);
    beta1[g] ~ normal(mu_beta1, sigma_beta1);
  }

  
}
  
"

Good question. The problem (or at least one of the problems) is that you have lower=0.1 and no upper bound but are using uniform distributions that start well above 0.1 and have a finite upper bound. So you have a mismatch between the constraints declared in the parameters block and the constraints imposed by choosing the particular uniform distributions you’ve chosen. What’s happening is that Stan is initializing given the constraints in the parameters block but then when you try to evaluate the uniform distribution in the model block you can be evaluating uniform(a,b) at a point lower than a or higher than b.

If you want a uniform distribution on [a,b] then you can use lower=a,upper=b and put nothing in the model block (the default is uniform). Or if you want to explicitly use uniform() so it’s clear to anyone who doesn’t know the default is uniform, then you need to use the same constraints in the parameters block as the bounds of the uniform distribution you’re using.

Having said all that, we don’t recommend uniform(a,b) priors on standard deviations because you pretty much never know the bounds. For that reason it’s preferable to use distributions that decay fast outside the range of values you think is plausible. This amounts to softly constraining the parameters. Basically the only time we recommend hard constraints is when it’s logically necessary (e.g. correlation has to be -1 to 1, sd has to be > 0, etc.).

Hope that helps!

4 Likes

So, that was indeed the problem. Thanks. I know that the half-Cauchy has been recommended for standard deviations, but do you have any other specific recommendations?

Thanks again,

David

1 Like

Glad that helped!

Half Cauchy is ok but we’ve moved away from it a bit more recently because the tail behavior can be problematic, especially if the data isn’t super informative.

I think @andrewgelman recommends half Normal for scale parameters these days. A half t distribution could also be useful, as it’s basically a compromise between the Cauchy and the normal (depending on the value of the degrees of freedom parameter).

Another option is an exponential distribution, which is what we default to in our rstanarm package.

1 Like