Hi everyone
I’ve got yet another example of the gradients not being finite, but having a finite likelihood for the initial parameters. I’m reaching out for help because I’ve been trying to debug for days now! I have a very complex model, which I can share, but would like to check if the answer is simple first before asking for more involved help.
If I had to take a guess at my problem, it would be an underflow or overflow issue. I have about a dozen model parameters on very different scales, some best-fit values are around 1e-4 and others around 10,000. All are positive only. My thinking to get around this was to transform the parameters by their prior means i.e.
parameters {
real param1;
real param2;
etc
}
transformed parameters {
real<lower=0> param1_tr = exp(param1 - 9);
real<lower=0> param2_tr = exp(param2 + 9);
etc
}
model {
param1 ~ normal(0, 1);
param2 ~ normal(0, 1);
etc
target += normal_lpdf(Y | mu, sigma);
}
such that param1_tr is around 1e-4 and param2_tr is around 1e4. These should be very close to the true values.
When I take a look at the gradients using grad_log_prob(fit, upars), where upars is a vector of zeros with a 1 for the sigma, I get
[1] NaN NaN NaN NaN NaN NaN NaN NaN
[9] NaN NaN NaN 31295823
attr(,“log_prob”)
[1] -15653355
There is a magic number for initialisation values, where if they are less than around -0.035, the gradients are finite. This I find quite puzzling.
My question is, would those transformations be the likely culprit? If so, could anyone suggest a better way to rescale the parameters?
Alternatively, would you expect this to be fine? If so, I’ll prepare a simulated dataset and upload the full model.
Many thanks for your help, I love this forum
Chris
Operating System: Ubuntu
Interface Version: rstan 2.21
Compiler/Toolkit: gcc