Rejecting initial value: Log probability evaluates to log(0), i.e. negative infinity


#1

My stan model is as following,

data {

int<lower=0> N;

real<lower=0> lower_logtiter[N];

real<lower=0> upper_logtiter[N];

int<lower=0,upper=1> infection[N];

}

parameters {

real alpha; // mean titer =2

real<lower=0> sigma; // sd titer =1

real<upper=0> rate; // risk rate =-0.3

real<lower=0,upper=1> risk; // basic risk

}

model {

alpha ~ normal(0,10);

sigma ~ cauchy(0,10);

rate ~ cauchy(0,10);

risk ~ cauchy(0,10);

for (n in 1:N)

target += log(normal_cdf( upper_logtiter[n] , alpha, sigma) - (lower_logtiter[n]>=1)*normal_cdf(upper_logtiter[n] - 1 , alpha, sigma)); // interval censored titer

for (n in 1:N)

target += bernoulli_lpmf(infection[n] | fmin(fmax(risk+rate*(lower_logtiter[n] + upper_logtiter[n])/2,0),1)); // infection

}

and my data is data.csv (18.0 KB)

after I run the stan model, it always showed that reject initial value, I don’t know what’s wrong with my model.


#2

It’s almost definite either:

target += log(normal_cdf( upper_logtiter[n] , alpha, sigma) - (lower_logtiter[n]&gt;=1)*normal_cdf(upper_logtiter[n] - 1 , alpha, sigma)); // interval censored titer

or

target += bernoulli_lpmf(infection[n] | fmin(fmax(risk+rate*(lower_logtiter[n] + upper_logtiter[n])/2,0),1)); // infection

(or possibly both). Easiest way to figure something like this out is just comment out various bits of the code and see what runs.

cdfs are tricky because the numerics are hard.

Do you mean upper_logtiter[n - 1] instead of upper_logtiter[n] - 1?

You can also use print to print things and debug stuff:

print(risk, rate);

for instance.

It’s probably better to work on the logit scale (with bernoulli_logit) than trying to limit the range of the parameter to the Bernoulli distribution with fmin(fmax(risk+rate*(lower_logtiter[n] + upper_logtiter[n])/2,0),1).

The data you attached looks like it was some sort of Microsoft format? It didn’t open on my computer anyway.

Hope that helps!


#4

Thanks so much, I changed the target += bernoulli_lpmf(infection[n] | fmin(fmax(risk+rate*(lower_logtiter[n] + upper_logtiter[n])/2,0),1)); as infection[n] ~ bernoulli_(fmin(fmax(risk+rate*(upper_logtiter[n]),0),1)); it works very well now in my simulation step, but when I put my real data into the model, the error came out again like following:

Rejecting initial value:

Log probability evaluates to log(0), i.e. negative infinity.

Stan can’t start sampling from this initial value.

0.001399490.8922810.563674-2.50283

Rejecting initial value:

Log probability evaluates to log(0), i.e. negative infinity.

Stan can’t start sampling from this initial value.

Initialization between (-1, 1) failed after 100 attempts.

Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.

[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed.”


#5
target += bernoulli_lpmf(infection[n] | fmin(fmax(risk+rate*(lower_logtiter[n] + upper_logtiter[n])/2,0),1));

and

infection[n] ~ bernoulli(fmin(fmax(risk+rate*(upper_logtiter[n]),0),1));

are going to do the same thing so I don’t see how that changed the situation. I still think working on a logit scale is going to be better for this problem (see bernoulli_logit in the manual).

So if you put simulated data into your model it fits, but your real data doesn’t?

This seems to indicate an issue with your real data. In the worst case, just take small pieces of your real data and feed it into the model.

When you have the smallest piece that still causes an error, manually investigate it and see what’s different from your simulated data.


#7

Thanks so much, I know the problem. I change 0 to 0.000001, and 1 to 0.999999, it works now. You are right, maybe there is some minor differences between my real data and simulated data :p.

I need to estimate rate, so if I use bernoulli_logit, I think I didn’t use the rate at all, or that rate is not my interested outcome, so I didn’t use it.

Yes, my simulated data worked a little bit better with infection[n] ~ bernoulli(fmin(fmax(risk+rate*(upper_logtiter[n]),0),1)); than target += bernoulli_lpmf(infection[n] | fmin(fmax(risk+rate*(lower_logtiter[n] + upper_logtiter[n])/2,0),1));


#8

You can still get the rate on the [0, 1] scale out by computing them with inv_logit(logit_scale_rates) in generated quantities. The scales of your predictors change though, it’s true.