Issues with running survival model in Stan

Hi all,
I am trying to run a survival model in Stan. I get the following error message but I cannot understand why this is the case.
Can someone help me with it?
Thank you!
Stan codes:

data {
  int <lower=0> province;
  int <lower=0> n_grid;
  int <lower=0> N;
  vector <lower=0> [province] p;
   matrix [province,n_grid] kernel;
  vector [N] age;
   vector [N] time;
  
  //vector [n_grid] x;
 
  vector[N] censor_status;
}
parameters{
  real beta0;
  real beta;
    real <lower=0> sigma;
    real <lower=0> sigma1;
 vector [n_grid] x;
}
transformed parameters{
  real time_ratio;
  vector [N] lambdaa;
  time_ratio  = exp(beta);
  for(n in 1:N) {
  lambdaa [n] = exp (beta0 + beta*age[n]);
  }
}
model{
  matrix [province,n_grid] landa;
     vector [province] z;
  vector [province] a;
  
  for (j in 1:n_grid){
   target += normal_lpdf(x[j]|0,1);
  }
  target += normal_lpdf(beta0| 0, sigma);
  target += normal_lpdf(beta| 0, sigma1);
  target += cauchy_lpdf(sigma| 0, 5);
  target += cauchy_lpdf(sigma1| 0, 5);
  for (k in 1:province) {
      for (j in 1:n_grid) {
          landa [k,j] = kernel [k,j] * exp (x[j]);
            }
            z[k] = sum(landa[k,]) * p[k];
            a[k] = z[k]/sum(z);
        }
  for (k in 1:province){
    for (i in 1:N) {
      target += (censor_status [i] * a[k] * lambdaa [i] * exp( -1 * lambdaa [i] * time[i])) + ((1 - censor_status [i])*a[k]*exp(-1 * lambdaa [i] * time[i]));
    }
  }
}

Error:

Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: 
Chain 1: Initialization between (-2, 2) failed after 100 attempts. 
Chain 1:  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
error occurred during calling the sampler; sampling not done

EDIT: @maxbiostat edited this post for syntax highlighting.

The initialization error is caused by incorrect assignment to a.
You cannot compute sum(z) inside the loop that initializes z; you have to wait until after every element of z has been assigned.

I see that sigma and sigma1 don’t do anything other than set the prior for beta and beta0.
Making them parameters with very wide distribution is pretty much the same as just giving betas very wide priors.

I think estimating x looks quite difficult and I recommend you start with a simpler model if possible.

data {
  int<lower=0> province;
  int<lower=0> n_grid;
  int<lower=0> N;
  vector<lower=0>[province] p;
  matrix[province, n_grid] kernel;
  vector[N] age;
  vector[N] time;
  vector[n_grid] x;
  int censor_status[N];
}
parameters {
  real beta0;
  real beta;
}
transformed parameters {
  real time_ratio;
  vector[N] lambdaa;
  time_ratio = exp(beta);
  for (n in 1 : N) {
    lambdaa[n] = exp(beta0 + beta * age[n]);
  }
}
model {
  matrix[province, n_grid] landa;
  vector[province] z;
  vector[province] a;
  for (j in 1 : n_grid) {
    target += normal_lpdf(x[j]| 0, 1);
  }
  target += cauchy_lpdf(beta0| 0, 5);
  target += cauchy_lpdf(beta| 0, 5);
  for (k in 1 : province) {
    for (j in 1 : n_grid) {
      landa[k, j] = kernel[k, j] * exp(x[j]);
    }
    z[k] = sum(landa[k,  : ]) * p[k];
  }
  a = z / sum(z);
  for (k in 1 : province) {
    for (i in 1 : N) {
      if (censor_status[i] == 1) {
        // target is the _logarithm_ of probability
        // so I think this what you were going for
        target += log(a[k]) + exponential_lpdf(time[i]| lambdaa[i]);
      } else {
        target += log(a[k]) + exponential_lccdf(time[i]| lambdaa[i]);
      }
    }
  }
}
1 Like

Thank you so much for your great help. Actually, I’m new to Stan.
Can you help me to run my model, please?
How can I compute sum(z) outside the loop that initializes z?

Here, x[j]'s (j=1,2, …, n_grid) are white noise and It is used only in the calculation of a and there is no need to estimate it.

Best regards,
Eisa

I posted a complete model above but here’s the relevant part

for (k in 1:province) {
    for (j in 1:n_grid) {
        landa [k,j] = kernel [k,j] * exp (x[j]);
    }
    z[k] = sum(landa[k,]) * p[k];
}
a = z / sum(z);
1 Like

I really appreciate your help. Now the model is running and there is no problem. I need the calculated values of a in the model, how can I get it in the output?

Thank you!

You can move z and a to the transformed parameters block. If you don’t want to save z you can place it in a nested block.

transformed parameters {
  real time_ratio;
  vector[N] lambdaa;
  vector[province] a; // `a` saved to output because it's a toplevel transformed parameter
  time_ratio = exp(beta);
  for (n in 1 : N) {
    lambdaa[n] = exp(beta0 + beta * age[n]);
  }
  { // `z` is only visible inside these braces, not part of output
    vector[province] z;
    for (k in 1 : province) {
      vector[n_grid] landa_k;
      for (j in 1 : n_grid) {
        landa_k[j] = kernel[k, j] * exp(x[j]);
      }
      z[k] = sum(landa_k) * p[k];
    }
    a = z / sum(z); // assign `a`
  }
}

I’m so grateful for your help and support. It was a challenging model for me but you made it easier. Thank you.

Sincerely yours,
Eisa