Hi everyone,
I’m running into issues with the error ‘Gradient evaluated at the initial value is not finite’ whilst trying to fit a very simple model (using cmdstanr
). The data is a single observation from a 3-category multinomial distribution, and the simplex of multinomial probabilities, \theta, is parametrised by a single parameter, p, as \theta = (p, 0, 1-p) .
The issue originally arose in trying to specify multi-state capture-mark-recapture models, where the likelihood is a product of multinomials, and the easiest way to define the multinomial probabilities is using matrix products. In my attempt to understand what was going wrong I abstracted away from the mark-recapture setting and simplified, and below is a pair of models which define exactly the same distribution, but the first works and the second doesn’t!
The data is simply
y <- c(30,0,70)
Here’s Model 1, which works perfectly:
data {
array[3] int<lower=0> y;
}
parameters {
real<lower=0,upper=1> p;
}
transformed parameters {
simplex[3] theta;
theta = [p,0,1-p]';
}
model {
y ~ multinomial(theta);
}
Model 2 is very artificial, but is motivated by the multi-state models where the multinomial probabilities are defined using matrix products.
data {
array[3] int<lower=0> y;
}
transformed data {
matrix[3,2] A = [ [1, 1], [0, 1], [-1, 1] ];
vector[3] b = [0,0,1]';
}
parameters {
real<lower=0,upper=1> p;
}
transformed parameters {
simplex[3] theta;
vector[2] x = [p,0]';
theta = A*x + b;
}
model {
y ~ multinomial(theta);
}
As I understand it, Model 1 and Model 2 define exactly the same posterior: the value of \theta = Ax + b should equal (p,0,1-p), as defined directly in Model 1. So why does Model 2 throw the infinite gradient error?
I believe the error is related to the presence of the zero in the multinomial probabilities \theta = (p, 0, 1-p), but that cannot be the whole story, because it’s not an issue for Model 1. So I assume it is something to do with this zero and the way the autodiff / chain rule handles the matrix products that define \theta in Model 2?
I’d be very grateful for any explanations and solutions!