Rejecting initial value: Chain 1: Log probability evaluates to log(0), i.e. negative infinity

Hi all,

I am encountering the same error that I have noticed others on this list have received. I tried different priors and still get this message. I couldn’t figure out what’s wrong. The code is below this message.

Appreciate it.

Ruizhe

SAMPLING FOR MODEL ‘anon_model’ NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1: Log probability evaluates to log(0), i.e. negative infinity.
Chain 1: Stan can’t start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1: Log probability evaluates to log(0), i.e. negative infinity.
Chain 1: Stan can’t start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1: Log probability evaluates to log(0), i.e. negative infinity.
Chain 1: Stan can’t start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1: Log probability evaluates to log(0), i.e. negative infinity.
Chain 1: Stan can’t start sampling from this initial value.

code:

stan_model <- "
data {
  int<lower=0> N;          // Number of observations
  vector[N] T1;             
  vector[N] T2;             
  vector[N] T3;            
  int<lower=0, upper=1> censor1[N]; // Censoring indicator for T1
  int<lower=0, upper=1> censor2[N]; // Censoring indicator for T2
  int<lower=0, upper=1> censor3[N]; // Censoring indicator for T3
}

parameters {
  real<lower=0> theta1;
  real<lower=0> theta2;
  real<lower=0> theta3;
  real<lower=0> xi1;
  real<lower=0> xi2;
  real<lower=0> xi3;
  real<lower=0> lambda1;
  real<lower=0> lambda2;
  real<lower=0> lambda3; 
  real<lower=0, upper=1> alpha;
}

model {
  
  // Noninformative improper priors
  xi1 ~ gamma(1, 0.01);
  xi2 ~ gamma(1, 0.01);
  xi3 ~ gamma(1, 0.01);
  lambda1 ~ normal(0, 10000);
  lambda2 ~ normal(0, 10000);
  lambda3 ~ normal(0, 10000);
  alpha ~ uniform(0, 1);  

   // Likelihood
  for (i in 1:N) {
    if (censor1[i] + censor2[i] + censor3[i]  == 0) {
      target += censor1[i]*weibull_lpdf(T1[i]|xi1,lambda1) + censor2[i]*weibull_lpdf(T2[i]|xi2,lambda2) + censor3[i]*weibull_lpdf(T3[i]|xi3,lambda3) + log(exp(-(theta1*weibull_lcdf(T1[i]|xi1,lambda1)+theta2*weibull_lcdf(T2[i]|xi2,lambda2)+theta3*weibull_lcdf(T3[i]|xi3,lambda3))^alpha));
    } 

    else if (censor1[i] + censor2[i] + censor3[i]  == 1) {
      target += censor1[i]*weibull_lpdf(T1[i]|xi1,lambda1) + censor2[i]*weibull_lpdf(T2[i]|xi2,lambda2) + censor3[i]*weibull_lpdf(T3[i]|xi3,lambda3) + log(alpha * exp(-(theta1*weibull_lcdf(T1[i]|xi1,lambda1)+theta2*weibull_lcdf(T2[i]|xi2,lambda2)+theta1*weibull_lcdf(T3[i]|xi3,lambda3))^alpha) * (theta1*weibull_lcdf(T1[i]|xi1,lambda1)+theta2*weibull_lcdf(T2[i]|xi2,lambda2)+theta3*weibull_lcdf(T3[i]|xi3,lambda3))^(alpha-1));
    } 

    else if (censor1[i] + censor2[i] + censor3[i]  == 2) {
      target += censor1[i]*weibull_lpdf(T1[i]|xi1,lambda1) + censor2[i]*weibull_lpdf(T2[i]|xi2,lambda2) + censor3[i]*weibull_lpdf(T3[i]|xi3,lambda3) + log(exp(-(theta1*weibull_lcdf(T1[i]|xi1,lambda1)+theta2*weibull_lcdf(T2[i]|xi2,lambda2)+theta3*weibull_lcdf(T3[i]|xi3,lambda3))^alpha) * (alpha * (1-alpha) * (theta1*weibull_lcdf(T1[i]|xi1,lambda1)+theta2*weibull_lcdf(T2[i]|xi2,lambda2)+theta3*weibull_lcdf(T3[i]|xi3,lambda3))^(alpha-2) + alpha^2 * (theta1*weibull_lcdf(T1[i]|xi1,lambda1)+theta2*weibull_lcdf(T2[i]|xi2,lambda2)+theta3*weibull_lcdf(T3[i]|xi3,lambda3))^(2*alpha-2)));
    } 
    else if (censor1[i] + censor2[i] + censor3[i]  == 3) {
      target += censor1[i]*weibull_lpdf(T1[i]|xi1,lambda1) + censor2[i]*weibull_lpdf(T2[i]|xi2,lambda2) + censor3[i]*weibull_lpdf(T3[i]|xi3,lambda3) + log(exp(-(theta1*weibull_lcdf(T1[i]|xi1,lambda1)+theta2*weibull_lcdf(T2[i]|xi2,lambda2)+theta3*weibull_lcdf(T3[i]|xi3,lambda3))^alpha) * (alpha * (alpha-1) * (alpha -2) * (theta1*weibull_lcdf(T1[i]|xi1,lambda1)+theta2*weibull_lcdf(T2[i]|xi2,lambda2)+theta3*weibull_lcdf(T3[i]|xi3,lambda3))^(alpha-3) + 3*alpha^2 * (1-alpha) * (theta1*weibull_lcdf(T1[i]|xi1,lambda1)+theta2*weibull_lcdf(T2[i]|xi2,lambda2)+theta3*weibull_lcdf(T3[i]|xi3,lambda3))^(2*alpha-3) + alpha^3 * (theta1*weibull_lcdf(T1[i]|xi1,lambda1)+theta2*weibull_lcdf(T2[i]|xi2,lambda2)+theta3*weibull_lcdf(T3[i]|xi3,lambda3))^(3*alpha-3)));
    }
  }
}
"

I assume you are just letting Stan initialize randomly? It helps us debug if you provide more context.

The priors all look fine, so the problem is going to be in your likelihood, which is very complicated. I would normally recommend starting with a simpler model. But if you want to try to debug your complicated model, I would suggest printing the target value after each update to see where it goes wrong. I would recommend putting the constraint <lower=0> on the declaration of T1, because it has to be positive for the Weibull to work. Similarly, xi1 and lambda1 must also be positive, but you have declared them as such, so those should be OK unless they’re underflowing.

Once you figure out which command is causing the -infinity increment, you can break down its terms and debug where they came from.

You don’t ever want to do log(exp(x))—just use x. The problem with log(exp(x)) is that exp(x) is likely to underflow or overflow, so is not very numerically stable. Similarly, you don’t want to do log(a * exp(b))—replace that with log(a) + b. Also, you don’t want to have log(a^x)—replace that with x * log(a)—it’s more efficient and more numerically stable.
j
I couldn’t understand what your likelihood was doing by counting the number of censor1, censor2, censor3. If you really do want to count them, you can define a transformed data variable

transformed data {
  array[N] int<lower=0, upper=3> censor;
  for (n in 1:N) censor[n] = censor1[n] + censor2[n] + censor3[n];
}

P.S. The priors you document as “noninformative improper priors” are both proper in the sense that they integrate to 1 and informative in the sense that they’ll have an effect on the posterior. For the latter, see Andrew Gelman’s paper urging people not to use priors like this: http://www.stat.columbia.edu/~gelman/research/published/taumain.pdf

Thanks for your help! I will try to modify the code to be more efficient and more numerically stable. Appreciate it!

Here’s a more readable version:

model {
  xi1 ~ gamma(1, 0.01);
  xi2 ~ gamma(1, 0.01);
  xi3 ~ gamma(1, 0.01);
  lambda1 ~ normal(0, 10000);
  lambda2 ~ normal(0, 10000);
  lambda3 ~ normal(0, 10000);
  alpha ~ uniform(0, 1);

   // Likelihood
  for (i in 1:N) {
    if (censor1[i] == 1) {
      T1[i] ~ weibull(xi1,lambda1);
    }
    if (censor2[i] == 1) {
      T2[i] ~ weibull(xi2,lambda2);
    }
    if (censor3[i] == 1) {
      T3[i] ~ weibull(xi3,lambda3);
    }
    real s1 = theta1*weibull_lcdf(T1[i]|xi1,lambda1)+theta2*weibull_lcdf(T2[i]|xi2,lambda2)+theta3*weibull_lcdf(T3[i]|xi3,lambda3);
    real s2 = theta1*weibull_lcdf(T1[i]|xi1,lambda1)+theta2*weibull_lcdf(T2[i]|xi2,lambda2)+theta1*weibull_lcdf(T3[i]|xi3,lambda3); // typo?
    if (censor1[i] + censor2[i] + censor3[i] == 0) {
      target += log(exp(-(s1)^alpha));
    }
    else if (censor1[i] + censor2[i] + censor3[i] == 1) {
      target += log(alpha * exp(-(s2)^alpha) * (s1)^(alpha-1));
    }
    else if (censor1[i] + censor2[i] + censor3[i] == 2) {
      target += log(exp(-(s1)^alpha) * (alpha * (1-alpha) * (s1)^(alpha-2) + alpha^2 * (s1)^(2*alpha-2)));
    }
    else if (censor1[i] + censor2[i] + censor3[i] == 3) {
      target += log(exp(-(s1)^alpha) * (alpha * (alpha-1) * (alpha -2) * (s1)^(alpha-3) + 3*alpha^2 * (1-alpha) * (s1)^(2*alpha-3) + alpha^3 * (s1)^(3*alpha-3)));
    }
  }
}

Note that weibull_lcdf computes the logarithm of the cumulative probability function which always negative (\log p < 0 whenever p < 1). You can’t raise a negative number (s1) to a non-integer power (alpha).
Did you maybe intend to multiply the target probability density by

\frac{1}{\left[\sum_{i}\theta_{i}\mathrm{CDF}\left(t_{i}\mid x_{i},\lambda_{i}\right)\right]^{\alpha}}

If so, the optimal code is

real log_s = log_sum_exp(log(theta1) + weibull_lcdf(T1[i]|xi1,lambda1),
                         log(theta2) + weibull_lcdf(T2[i]|xi2,lambda2),
                         log(theta3) + weibull_lcdf(T3[i]|xi3,lambda3));
if (censor1[i] + censor2[i] + censor3[i] == 0) {
  target += -alpha * log_s;

Theta does not have a prior. That “weighted sum of CDFs” business makes me think theta should be a simplex

parameters {
  simplex[3] theta;
  ...

(A simplex gets an implicit Dirichlet(1,1,1) prior; explicit prior declaration is not needed.)


If I were to ignore the parts of the code I don’t understand and just guess the model from the variable names, I’d write the likelihood as

  for (i in 1:N) {
    if (censor1[i] == 1) {
      T1[i] ~ weibull(xi1,lambda1);
    } else {
      target += weibull_lcdf(T1[i]|xi1,lambda1);
    }
    if (censor2[i] == 1) {
      T2[i] ~ weibull(xi2,lambda2);
    } else {
      target += weibull_lcdf(T2[i]|xi2,lambda2);
    }
    if (censor3[i] == 1) {
      T3[i] ~ weibull(xi3,lambda3);
    } else {
      target += weibull_lcdf(T3[i]|xi3,lambda3);
    }
  }

but that leaves no role for alpha.