How exactly is lp__ being calculated

Sorry if this is not the right category.
I want to understand how lp__ is being calculated for a very simple model, say I have a model like this:

data {
  int<lower = 1> N;  
  vector[N] y;  
}
parameters {
  real mu;
  real<lower = 0> sigma;
}
model {
  target += normal_lpdf(mu | 0, 20); 
  target += lognormal_lpdf(sigma | 3, 1); 
  target += normal_lpdf(y | mu, sigma); 
}

with data:

Y <- rnorm(n = 100, mean = 3, sd = 10)

I would assume I can get the same value of lp__ for a given sample by doing the following in R:

mu <- extract(fit_score)$mu
sigma <- extract(fit_score)$sigma
addmu <- dnorm(x = mu[1], mean = 0, sd = 20, log = TRUE)
addsigma <- dlnorm(x = sigma[1], mean = 3, sd = 1, log = TRUE)
addY <- sum(dnorm(Y,  mu[1], sigma[1], log = TRUE))
lp <- addmu + addsigma + addY

But I don’t, I get similar values -371 by hand and -369 with extract(fit_score)$lp__[1]. Is this because of differences between dnorm/dlnorm and stan lpdf? Or is there something else that is added or not added in Stan in comparison with my R code?

3 Likes

(I don’t know; Jacobian?)

If you change lognormal to normal, can you then reproduce the numbers?

edit.

4 Likes

yes, the answer is always the jacobian :)
The difference is not because the lognormal, but the <lower = 0>. If I remove it, I get the same results.

5 Likes

The variable lp__ starts at 0.

The parameters block increments it with the log absolute Jacobian determinant of the transform for any constrained variables.

The model block increments lp_ with the value of the right-hand side of any target += statement.

The model block increments lp_ with each sampling statement, adding the log density, but dropping constants that do not depend on parameters. If you’d rather keep the proportionality constants, use target +=.

That’s it. None of the other blocks touch it.

Its final value is the log density on the unconstrained parameter scale up to a constant that doesn’t depend on the parameters (and stays fixed iteration to iteration).

The value of lp__ at any given point is what you get from target() if you want to track it by printing. For example, you can print at the top of the model to see the starting point being the change-of-variables adjustment. Or compare the effect of ~ vs. target +=.

4 Likes

@Bob_Carpenter great explanation. I think this information is spread over various documents, so there’s no place it’s so concise like this. I’m not surprised that there are questions about it. I like @bnicenboim’s post title “How exactly is lp__ being calculated?” for an entry in a hypothetical FAQ or FAQ-like resource, so @martinmodrak may be interested.

3 Likes

Thanks, @Bob_Carpenter, that was really clear. Parameters with <lower=X> are back transformed to the constrained space with exp(auxiliary parameter) + X, right? How are the other parameters backtransformed? Is it written somewhere?
And another small thing, maybe it would be a good idea to return target rather than lp__ in the summary?

1 Like

Chapter 10 has all the transformations + Jacobian determinants: 10 Constraint Transforms | Stan Reference Manual

2 Likes

Thanks! This is new (and great!)

I agree that it’s weird to use target in the Stan program but then see lp__ in the output. Before target+= we had increment_log_prob(), which has a more transparent connection to the name lp__. Not replacing target with lp__ in the output may have just been to avoid breaking backwards compatibility, I forget. We deprecated increment_log_prob() but it still works. It’s harder to deprecate the name of a variable in the output.

Once upon a time, I did a few talks around this. Slides and examples here: Slides from "How Stan computes the posterior distribution" · syclik

(unfortunately, my slides aren’t really super useful without context… I could probably record a video.)

I’ve brought it up before, but I’d really love to see our current lp__ broken into two parts:

  • the Jacobian term
  • the model… essentially the part that’s uses target +=

I think it would help with understanding the effect of the transformation (whether we’re ending up in bad numeric areas from the constraining operation) and tracking through problems with the model. Right now, if there’s an bug in the model, I need to trap it by print by placing print statements before and after the offending lines. If it were split out, I could run the likelihood in a separate language, like R, and get results to compare.

3 Likes

It’s actually been in the manual since version 1. I think the reference manual is probably getting lost in the shuffle these days after being split out into a separate volume.

And back in the days when we used <- for assignment, we had

lp__ <- lp__ + alpha;

Removing lp__ as a user-modifiable variable was the only backwards-compatibility breaking change we’ve made to the language.

I got tired of suggesting this because I was the only one speaking up for it. Specifically, I wanted to be able to have

transformed parameters {
   real alpha = exp(alpha_unc);

   jacobian += exp(alpha);
}

so that we could define Jacobians for transformed parameters right in the transformed parameters block. Then lp__ would be a combianation of the constrained model log density plus the log Jacobian determinant of the constraining transform.

Without this, we can’t actually implement Stan transformation in Stan. In uses for optimization, we turn the Jacobian off. There’s now way to write a Jacobian that gets turned off for optimization in the Stan language as it stands.

But, when I implemented the first version of the Stan language, I forgot to allow lp__ modifying statements in transformed parameters. Then everyone got fixated on all lp__ modification happening in the model block. That’s also why we don’t allow sampling statements in the parameters block for priors.

But I still like it, and if I were king of Stan, we’d have the above syntax.

3 Likes

So, to make sure, if I have declared a parameter real<lower=0> theta and want its square to have a positive-truncated standard normal prior, is it correct to have

target += normal_lpdf(square(theta) | 0, 1) + log(fabs(2*x));

in the model block? So I don’t have to account for the <lower=0> truncation myself, only the squaring? And if I instead had a transformed parameter real theta2 = square(theta), then I would not need + log(fabs(2*x))?

Either I missed that or, more likely, I didn’t understand the impact until I made those slides. So… now I’m speaking up with you!

I remember the idea being floated a while ago but I don’t remember what the reasons put forward against it were. To me splitting it up like this seems like a nice idea.

Maybe it’s worth bringing up again. There could be more support for it now.

1 Like

Right. We apply the Jacobian for the lower bound so it’s (improper) uniform on positive values. Same for lower and upper bounds, where it’s uniform in that range.