# Gradient Pystan log_prob: constrained versus unconstrained

I’m having an issue where the gradients I calculate across models with constrained and unconstrained parameters don’t seem to match, even after I transform my variables to be on the unconstrained scale.

Suppose I specify a model:

data {
int<lower=0> N;
real y[N];
}
parameters {
real mu;
real sigma;
}
model {
y ~ normal(mu, sigma);
}


then I calculate the gradient in Pystan of the log probability assuming data:

 {'N':1, 'y':[0]}


then I obtain,

stanfit.grad_log_prob([1, 2]) [= (-1/4,-3/8)]


which I can match by hand, so this looks fine.

If I instead allow sigma to be constrained and repeat the exercise using the following model:

data {
int<lower=0> N;
real y[N];
}
parameters {
real mu;
real<lower=0> sigma;
}
model {
y ~ normal(mu, sigma);
}


then calculating the gradients, I obtain:

stanfit.grad_log_prob([1, np.log(2)], adjust_transform=True) [=(-1/4,1/4)]


So neither of the gradients for sigma match what I had when using the unconstrained model, which seems weird.

Am I misinterpreting something here?

The gradient is not with respect to \sigma but the unconstrained u (where \sigma = \exp(u)). Per chain rule

\frac{d}{du}\log p = \frac{d\sigma}{du}\times\frac{d}{d\sigma}\log p = \sigma\times\frac{d}{d\sigma}\log p

If I’m reading this right your adjust_transform=False result should be 2\times-\frac{3}{8}=-\frac{3}{4}.
Double-check the sign, other than that looks as expected.

adjust_transform=True adds the log-determinant of the Jacobian of the constraining transform which in this case is \log \sigma(=u) and that should add +1 to the gradient.

1 Like

Thank you very much. You’re quite right: it was -3/4.

I feel like perhaps adding this to the documentation for the function would help quite a few users. Namely, that the result of grad_log_prob is on the unconstrained scale (of course, I know that users should probably know this but I feel like it would avoid confusion like mine). Anyway, just a thought!