Model starts with some initial values rejected


My model does a bunch of
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.

And then it starts. But I would like to know what is triggering those rejections. I saw old posts where it says that the new (2.16) stan should report the line, but it’s not the case: I’m using rstan 2.16.2, and I’ve tried with one chain, but I don’t get any info. Do you have any tips on how to debug this?
(I’m a having a lot of troubles with chains getting stuck with a fairly complex model (the hierarchical version of the one I posted here.


Stan, by default, initializes every unconstrained parameter value as a random uniform value in (-2, 2).

That message that you see, “Rejecting initial value: Log probability evaluates to log(0)”, means that evaluating the log probability, aka the model block, evaluates to -Inf with those parameter values. Stan then restarts multiple times to see if it can start sampling.

The common mistake is to set up the model with parts of the constrained parameter space with no support (i.e. probability 0 or log probability -inf).


My custom made pdf has inside some lcdfs that are very prone to over/underflow, I took care to avoid NaNs, but the pdf is actually -Inf for some combinations of parameters. I guess that that’s making my target -Inf at initialization and that’s why I get this messages. Could that be?

If this is it, I have no way to calculate the exact negative value of the pdf, so I’ll live with that…


Yes, that could be it. You could start with a tighter initialization radius, but that might not actually help.


This can cause serious problems for HMC if those regions are anywhere near things you care about or if the probability mass of the missing region depends on parameters (because then you’d need the normalization term).


Sorry, what do you mean by depends on parameters?


For instance, if you truncate normal(mu, sigma) at lower = 2 and mu is a parameter, then how much probability mass is missing (i.e., below 2) depends on mu.