"Gradient evaluated at the initial value is not finite." error

davidh · December 22, 2022, 3:28pm

I am trying to implement a straightforward Bayesian MDS model, based on the code from Gronau and Lee. The code is here and the relevant file is the demcmc_all_dim.cpp file. However, I’m running into a weird error with non-finite gradients and I’m wondering if anyone on this forum might have any ideas for how to fix it. I’ve copied my current code here – I know there are several optimizations that could be made with Stan but for now I’m hewing as close to the original C code as possible.

data {
  int nDimensions;
  int nStimuli;
  int nSubjects;
}
transformed data {
  int n_zero = nDimensions * (nDimensions + 1) / 2;
  int n_nonzero = nStimuli * nDimensions - n_zero;
  int n_pos = nDimensions;
  int n_free = n_nonzero - n_pos;
}
parameters {
  vector<lower=0, upper=1>[n_pos] xpos;
  vector<lower=-1, upper=1>[n_free] xfree;
  real<lower=0> sdS;
}
transformed parameters {
  matrix[nStimuli, nDimensions] x;
  // compute distances
  matrix[nStimuli, nStimuli] d;
  
  {
    int i_free = 1;
    int i_pos = 1;
    for (i in 1:nStimuli) {
      for (j in 1:nDimensions) {
        if (i <= j) {
          x[i,j] = 0;
        } else if (i == j + 1) {
          x[i,j] = xpos[i_pos];
          i_pos += 1;
        } else {
          x[i,j] = xfree[i_free];
          i_free += 1;
        }
      }
    }
  }
//the following block seems to be the issue
  for (i in 1:nStimuli) {
    for (j in 1:nStimuli) {
      real tmp = 0.0;
      for (k in 1:nDimensions) {
        tmp += pow(x[i,k] - x[j,k], 2);
      }
      d[i, j] = sqrt(tmp);
    }
  }
}
model {
}

and the following code should allow for reproducing the error:

library(cmdstanr)
data <- list(nDimensions = 2,
             nStimuli = 9,
             nSubjects = 20)

mds_gronau_lee_test_cmdstan_model <- cmdstan_model(file.path(model_dir, "mds_gronau_lee_test.stan"))
mdsres.fixed.gronau.lee <- mds_gronau_lee_test_cmdstan_model$sample(data = data,
                                                                    chains = 1,
                                                                    iter_warmup = 1, 
                                                                    iter_sampling = 1)

I don’t really understand where the issue with the gradient could possibly come from since there is actually no likelihood evaluation! The parameters themselves are simply constrained parameters and I’m not sure I understand why the transformed parameters block would make a difference as is. For what it’s worth, the issue seems to be the block I’ve commented in the code. If I comment it out or set
d[i, j] = 1;, there is no sampling issue. Thanks in advance for the help!

caesoma · December 28, 2022, 9:14am

Can you post the actual error message you are getting? There may be additional information there on why this is happening.

I’m not completely sure, but I think you may need something in the model block, otherwise you are just specifying function and declaring variables. I suspect Stan may always try to evaluate gradients based on the model block, and not finding anything there it may raise an error. You can test that by adding any distribution statement in the block and see if the error goes away, and if not we can go from there…

davidh · December 28, 2022, 2:13pm

Thanks for the response! My apologies for not including it before, the full error message is:

Chain 1 Rejecting initial value:
Chain 1 Gradient evaluated at the initial value is not finite.
Chain 1 Stan can’t start sampling from this initial value.
…
Chain 1 Rejecting initial value:
Chain 1 Gradient evaluated at the initial value is not finite.
Chain 1 Stan can’t start sampling from this initial value.
Chain 1 Initialization between (-2, 2) failed after 100 attempts.
Chain 1 Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
Warning: Chain 1 finished unexpectedly!

The … section is where the same error message gets copied several times (presumably since it tries to start sampling several times).

I don’t think the issue is that it is missing code in the model block. As I mentioned in the previous post, if you change the line d[i, j] = sqrt(tmp); to d[i, j] = 1;, the model runs without issue. However, I also tried changing the model block to

model {
  sdS ~ normal(.15, .2);
}

and got the same error I copied above. Hope that clarifies things and thanks so much for your help working through this!

caesoma · December 28, 2022, 4:49pm

Sorry, I missed that the first time around. I’m not sure what values that ends up having, but my next guess would be that if that is still zero it may taking the log somewhere and having an invalid value. It’s still not entirely clear to me why exactly the gradient would not be finite otherwise, but maybe you could check that there are no nonfinite values being computed in that loop.

davidh · December 29, 2022, 8:10am

Ah yes, that did it, thank you! The diagonal entries are always 0 so I guess it was trying to differentiate the sqrt through that. Thanks again for the help!

Topic		Replies	Views
Gradient evaluated at the initial value is not finite Modeling rstan , fitting-issues	1	568	January 1, 2022
Gradient evaluated at the initial value is not finite although the lp is finite Modeling cognitive-science	12	3815	April 13, 2020
Please delete Modeling	4	505	May 14, 2018
Issue with model: Gradient evaluated at the initial value is not finite Modeling rstan , specification , multinomial-response , hierarchical-model , dirichlet-multinomial	9	486	September 14, 2023
'Gradient evaluated at the initial value is not finite' with very simple multinomial model! Modeling fitting-issues	2	396	August 3, 2023

"Gradient evaluated at the initial value is not finite." error

Related topics