'Gradient evaluated at the initial value is not finite' with very simple multinomial model!

mbc · August 3, 2023, 8:40am

Hi everyone,

I’m running into issues with the error ‘Gradient evaluated at the initial value is not finite’ whilst trying to fit a very simple model (using cmdstanr). The data is a single observation from a 3-category multinomial distribution, and the simplex of multinomial probabilities, \theta, is parametrised by a single parameter, p, as \theta = (p, 0, 1-p) .

The issue originally arose in trying to specify multi-state capture-mark-recapture models, where the likelihood is a product of multinomials, and the easiest way to define the multinomial probabilities is using matrix products. In my attempt to understand what was going wrong I abstracted away from the mark-recapture setting and simplified, and below is a pair of models which define exactly the same distribution, but the first works and the second doesn’t!

The data is simply

y <- c(30,0,70)

Here’s Model 1, which works perfectly:

data {
  array[3] int<lower=0> y;    
}

parameters {
  real<lower=0,upper=1> p;
}

transformed parameters {
  simplex[3] theta;
  theta = [p,0,1-p]';
}

model {
  y ~ multinomial(theta);
}

Model 2 is very artificial, but is motivated by the multi-state models where the multinomial probabilities are defined using matrix products.

data {
  array[3] int<lower=0> y;     
}

transformed data {
  matrix[3,2] A = [ [1, 1], [0, 1], [-1, 1] ];
  vector[3] b = [0,0,1]';
}

parameters {
  real<lower=0,upper=1> p;
}

transformed parameters {
  simplex[3] theta;
  vector[2] x = [p,0]';
  theta = A*x + b;
}

model {
  y ~ multinomial(theta);
}

As I understand it, Model 1 and Model 2 define exactly the same posterior: the value of \theta = Ax + b should equal (p,0,1-p), as defined directly in Model 1. So why does Model 2 throw the infinite gradient error?

I believe the error is related to the presence of the zero in the multinomial probabilities \theta = (p, 0, 1-p), but that cannot be the whole story, because it’s not an issue for Model 1. So I assume it is something to do with this zero and the way the autodiff / chain rule handles the matrix products that define \theta in Model 2?

I’d be very grateful for any explanations and solutions!

Bob_Carpenter · August 3, 2023, 3:48pm

You put your finger on the source of the problem—chain rule going bad. Let’s see where.

You can use slightly more compact notation here.

As an aside, this is super dangerous as it’s going to fail whenever y[2] != 0.

Your second example is equivalent to the simpler example:

simplex[3] theta = [ p, 0 * p, 1 - p];

What’s going on is that we are going to pass 0 * p as the second argument to the multinomial. That’s going to evaluate log(y^(0 * p)) with y = 0 as (0 * p) * log(0) as part of evaluating the multinomial, which is going to propagate the -infinity value of log(0) back to p.

Thus 0 * p will indeed behave differently than 0 in this context because there’s nothing to pass the infinite derivative to in that case.

I’m afraid there’s not much we can do to fix this as it would require symbolic math to reduce the 0 * p to 0 without propagating derivatives down to p.

jsocolar · August 3, 2023, 6:33pm

Hey @Bob_Carpenter just want to double check that it’s totally clear to you that this couldn’t be fixed with a patch for multiply_log(0, 0) has infinite gradient · Issue #2494 · stan-dev/math · GitHub. Not saying that it can, but it seems so closely related that I just wonder if it might.

Topic		Replies	Views
Issue with model: Gradient evaluated at the initial value is not finite Modeling rstan , specification , multinomial-response , hierarchical-model , dirichlet-multinomial	9	486	September 14, 2023
Gradient evaluated at the initial value is not finite although the lp is finite Modeling cognitive-science	12	3815	April 13, 2020
Yet another case of (Rejecting initial value: Gradient evaluated at the initial value is not finite.) Modeling	3	515	July 18, 2022
Gradient evaluated at the initial value is not finite Modeling rstan , fitting-issues	1	568	January 1, 2022
"Gradient evaluated at the initial value is not finite." error Modeling fitting-issues	4	527	December 29, 2022

'Gradient evaluated at the initial value is not finite' with *very* simple multinomial model!

Related topics

'Gradient evaluated at the initial value is not finite' with very simple multinomial model!