Error in initializing due to nonfinite gradient rejection because of -infinite values in transformed parameters

I have a model where I construct a partially fixed matrix in the transformed parameters block, so that diagonal and parts of the upper diagonal matrix are estimated and others are set to -Inf. However, this seems to result in nonfinite gradient errors, although the likelihood (and priors) are always finite. Here is a toy example of the issue which results in the same error:

data {
  int<lower=1> N;
  real y[N];

}

parameters {
  real mu;
}

transformed parameters {
  vector[2] x;
  x[1] = mu;
  x[2] = negative_infinity();
}
model {
  mu ~ normal(0, 1);
  y ~ normal(x[1], 1);
}

This is of course very artificial example where the issue could be circumvented, but in my practical application, this “x” variable is input to custom likelihood computation function, where it is much easier to do the computations with full matrix containing -Inf values instead of creating custom elementwise computations (the location of -Inf values are fixed case-by-case). Edit: Maybe I should be more direct with the application I’m having: Essentially I am building a constrained transition matrix for a Markov model where some of the transition probabilities are fixed to zero (leading to -Inf in the log-scale).

It seems that this issue was also raised at Github years ago (Get "infinite gradient" error for code with finite gradient · Issue #275 · stan-dev/rstan · GitHub). Any ideas on how to deal with this?

2 Likes

I figured out an ugly hack to get rid of the issue in my case:

      // variable tmp below is constructed as theta + x 
      // where x might contain fixed -Inf elements and unknown parameters
      int n_finite = 0;          
       // reorder elements of tmp so that all finite values are at the start
      for(i in 1:M) {
        if(tmp[i] > negative_infinity()) {
          n_finite += 1;
          tmp[n_finite] = tmp[i];
        }
      }
      if(n_finite == 0) {
        ll = negative_infinity(); //log_sum_exp of -Infs is -Inf
      } else {
        ll = log_sum_exp(tmp[1:n_finite]);
      }
2 Likes

I think you answered your own question, so what I’m writing here is mostly redundant, just validates what you already said.

Your toy example shows that there may be nothing wrong with the calculations that are actually necessary but an error will still be raised because HMC is a black box that requires no inf in the gradients regardless, and that the only solution may be exclude those values
This is probably not possible in your matrix formulation, so maybe a hacky solution like the one below is the easiest way to go (at least loops are still pretty fast, so the formulation may not look as elegant, but should work fine if the model isn’t too large).