Rejecting initial value - Gradient evaluated at the initial value is not finite

I’m trying to fit a hierachical model described by this paper, A Bayesian spatio-temporal model of COVID-19 spread in England | Scientific Reports. Yet I can’t seem to figure out this message:
Chain 1: Rejecting initial value:
Chain 1: Gradient evaluated at the initial value is not finite.
Chain 1: Stan can’t start sampling from this initial value.
Chain 1:
Chain 1: Initialization between (-2, 2) failed after 100 attempts.
Chain 1: Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.

I’ve tried initialising everything to 0 at the start, and printing each of the parameters, but none of them are nan, so I’m not too sure what’s going on. Model code is below. Any help would be greatly appreciated.

data {
  int<lower=0> s; // number of spatial locations
  int<lower=0> t; // number of time points
  int<lower=0> k; // number of covariates
  
  vector<lower=0>[t*s] N; // population
  
  int<lower=0> y[t*s]; // response data
  matrix[s, k] X; // dependent data
  
  cov_matrix[s] big_sigma_w;
}

parameters {
  vector[k] beta; // covariates
  vector[s] w[t]; // spatial random effect

  vector[s] xi_start; // first value of xi
  
  real<lower=0> sigma_w; // Spatial variance parameter
  real<lower=-1, upper=1> alpha; // Temporal dependence parameter
}

transformed parameters {
  
  // Transform
  vector[s] N_vec[t];
  for (i in 1:t) {
    N_vec[i] = to_vector(N[((i - 1) * s + 1):(i * s)]);
  }
  
  vector[s] xi[t]; // spatio-temporal random effect
  xi[1] = xi_start; // Assign first value

  if (t > 1){
    for (i in 2:t){
      xi[i] = alpha * xi[i-1] + w[i];
    }
  }
  
  vector[s] log_theta[t]; // infection risk
  vector[s] mu[t]; // mean
  
  for(i in 1:t){
    log_theta[i] = X * beta + xi[i];
    mu[i] = N_vec[i] .* exp(log_theta[i]);
  }
  
  vector[t*s] mu_flat;
  
  for (i in 1:t) {
    mu_flat[((i - 1) * s + 1):(i * s)] = mu[i];
  }
  
}

model {
  
  // Penalized complexity prior for sigma_w using Half-Normal
  // This ensures p(sigma_w >1) = 0.01
  sigma_w ~ normal(0, 0.43); // Adjust the standard deviation as needed

  // // Apply Beta prior to bound_alpha, which is transformed to alpha
  // // This ensures p(alpha > 0) = 0.9
  // bound_alpha ~ beta(2, 20);
  
  xi_start ~ normal(0, sigma_w^2 / (1 - alpha^2));
  
  beta ~ normal(0, 1000);
  
  w ~ multi_normal(rep_vector(0, s), sigma_w^2*big_sigma_w);
  
  
  for (i in 1:t*s){
    y[i] ~ poisson(mu_flat[i]);
  }
  
  for (i in 1:t*s){
    print("mu_flat[", i, "] = ", mu_flat[i]);
    print("y[", i, "] = ", y[i]);
  }
  
  // for(i in 1:k){
  //   print("beta[", i, "]", beta[k]);
  // }
  
  // for( i in 1:t){
  //   print("w[", i, "]", w[i]);
  // }
  
  // print("sigma_w = ", sigma_w);
  // print("alpha = ", alpha);
  
  
}

generated quantities {
  vector[t*s] fitted_values;

  for (i in 1:t*s) {
    fitted_values[i] = poisson_rng(mu_flat[i]);
  }
}

Trying with the full dataset gives this error:
Chain 1: Rejecting initial value:
Chain 1: Log probability evaluates to log(0), i.e. negative infinity.
Chain 1: Stan can’t start sampling from this initial value.

Thanks all

1 Like

The problem is reported with the gradient, which may be problematic even if the values all look good. Unfortunately, debugging gradients is not really possible in any direct way.

I don’t see anything obviously wrong with the model. So I think your best bet is to try to simplify the model to see which part introduces the issues (e.g. by replacing some parameters with constants/moving them to data)

Best of luck with your model!

1 Like