How to debug "Gradient evaluated at the initial value is not finite."

Hi,

I’m attempting to adapt a colleagues JAGS model to Stan that estimates an infection time series across multiple sites. The Stan version has made it to the stage where it compiles, but fails with an infinite gradient error:

Rejecting initial value:
Gradient evaluated at the initial value is not finite.
Stan can’t start sampling from this initial value.

Initialization between (-2, 2) failed after 100 attempts.

What would be the best way to move forward with debugging the model? Is there a way to figure out which call is causing the infinite gradient? I tried using the test_gradient option and the diagnose method from cmdstanr, but they only give the setup options, not any of the intermediate output.

Here is the code. I apologize that it is on the longer side, I don’t know how to meaningfully simplify it. I could cut it down to a single site if that would help.

data {
  int<lower=1> J;
  int<lower=1> I;
  array[I,J] int<lower=0> attendees;
  array[I,J] int<lower=0> fever;
  array[I,J] int<lower=0> tested;
  array[I,J] int<lower=0> Pf_confirmed;
  array[J] int<lower=1> N2;
}



parameters {
  real alpha1_Pf;
  real alpha2_Pf;
  real <lower=.7,upper=.9>f_Pf;
  real <lower=0,upper=1>s;
  array[J] real <lower=0,upper=1> r;
  array[J] real beta0_test;
  array[J] real beta0_att;
  array[I,J] real <lower=0> Endemic_Pf;
  
  
  

  array[J] real<lower=0> alpha0_Pf;
  array[J] real<lower=0,upper=10> RC_Pf1; 
  array[J]  real<lower=0,upper=1> lambda_E_Pf1;

}

transformed parameters {
  array[J] real P_ATT;
  array[J] real P_TEST;
  array[I,J] real<lower=0> pconfirm_Pf;
  array[I,J] real <lower=0> lambda_attending;
  array[I,J] real <lower=0> lambda_fever;
  array[I,J] real RC_Pf;
  
  array[I,J] real <lower=0> lambda_E_Pf;
  
  for(j in 1:J){
    P_ATT[j]=inv_logit(beta0_att[j]);
    P_TEST[j]=inv_logit(beta0_test[j]);
    RC_Pf[1,j]= RC_Pf1[j];
    lambda_E_Pf[1,j]=lambda_E_Pf1[j];
  }
  
  
  for(i in 1:I){
    for(j in 1:J){
      RC_Pf[i,j] = Endemic_Pf[i,j];
      lambda_attending[i,j] = (r[j]*(N2[j]-RC_Pf[i,j]) + f_Pf*RC_Pf[i,j]) * (P_ATT[j]);
      lambda_fever[i,j] = (s*r[j]*(N2[j]-RC_Pf[i,j]) + f_Pf*RC_Pf[i,j] );
      pconfirm_Pf[i,j] = (P_TEST[j]*f_Pf*RC_Pf[i,j]);
      
    }
  }
  
  for(i in 2:I){
    for(j in 1:J){
      lambda_E_Pf[i,j] = exp(-alpha0_Pf[j] + alpha1_Pf*RC_Pf[i,j] + alpha2_Pf*RC_Pf[(i-1),j]);
    }
  }
  
}


model {

  alpha1_Pf ~ normal(0, 31.2);
  alpha2_Pf ~ normal(0, 31.2);
  
  f_Pf ~ uniform(0.7, 0.9); 
  s~ beta(1, 10);
  
  for ( j in 1:J) {
   
    r[j] ~ beta(1,10);
   
    
    beta0_test[j] ~normal(0, 10);
    beta0_att[j] ~normal(0, 10);
    
    alpha0_Pf[j] ~ gamma(1,1);
    
    // initialize values 
    Endemic_Pf[1,j]~uniform(0,10);
    Endemic_Pf[2,j]~uniform(0,10);
   
    
    
    for ( i in 1:I) {
      
   // OBSERVATION PROCESS
     
      
      
      attendees[i,j] ~ poisson(lambda_attending[i,j]);
      fever[i,j]~ poisson(lambda_fever[i,j]);                    
      tested[i,j]~ poisson(P_TEST[j]*fever[i,j]);
      Pf_confirmed[i,j]  ~ poisson(pconfirm_Pf[i,j]);
    }
    
    // MALARIA PROCESS - Pf
    for ( i in 2:(I-1)) {
      Endemic_Pf[(i+1),j] ~ normal(N2[j]*lambda_E_Pf[i,j],N2[j]*lambda_E_Pf[i,j]);
    }
  }
}


I’m sure there’s a better way to do it but I’d start putting print statements for all of your immediate containers and see if anything is -Inf or nan. That’s helped me resolve this issue too many times to count.

1 Like

Ah, that’s a good idea. I had print statements for the variables in the transformed parameters when I was debugging the initial version, but for some reason I hadn’t thought to do it on all the other parameters. Thanks!

edit:
Oh my gosh, I hadn’t entirely read the entire documentation on print statements and missed the super relevant section on debugging via print and target().

Unfortunately print() does not print gradients. When the target is finite but has nonfinite gradient, there’s not much you can do other than 1) set init=0 or some plausible values and 2) delete parts of the model until it initializes.

Looking at the code, maybe lambda_E_Pf becomes so extreme that the “malaria process” normal distribution at the end bugs out.

1 Like

I think the issue might be due to Endemic_Pf being declared as unbounded positive:

array[I,J] real <lower=0> Endemic_Pf;

But the model likelihoods only have support from 0-10:

    Endemic_Pf[1,j]~uniform(0,10);
    Endemic_Pf[2,j]~uniform(0,10);

Which means that the parameter could be initialised with a value greater than the likelihood is defined for.

If you add upper=10 to the Endemic_Pf declaration, does that change anything?

1 Like

Thanks for all the suggestions.

adding an upper bound to Endemic_Pf didn’t fix things, nor did setting init to 0.

I’ll probably try cutting down the model to start from a simplified point that will hopefully run and build back from there. I could also take some of the parameter estimates from the jags model and use those for a more fleshed out set of initial conditions.