How exactly is lp__ being calculated

Sorry if this is not the right category.
I want to understand how lp__ is being calculated for a very simple model, say I have a model like this:

data {
  int<lower = 1> N;  
  vector[N] y;  
}
parameters {
  real mu;
  real<lower = 0> sigma;
}
model {
  target += normal_lpdf(mu | 0, 20); 
  target += lognormal_lpdf(sigma | 3, 1); 
  target += normal_lpdf(y | mu, sigma); 
}

with data:

Y <- rnorm(n = 100, mean = 3, sd = 10)

I would assume I can get the same value of lp__ for a given sample by doing the following in R:

mu <- extract(fit_score)$mu
sigma <- extract(fit_score)$sigma
addmu <- dnorm(x = mu[1], mean = 0, sd = 20, log = TRUE)
addsigma <- dlnorm(x = sigma[1], mean = 3, sd = 1, log = TRUE)
addY <- sum(dnorm(Y,  mu[1], sigma[1], log = TRUE))
lp <- addmu + addsigma + addY

But I don’t, I get similar values -371 by hand and -369 with extract(fit_score)$lp__[1]. Is this because of differences between dnorm/dlnorm and stan lpdf? Or is there something else that is added or not added in Stan in comparison with my R code?

2 Likes

(I don’t know; Jacobian?)

If you change lognormal to normal, can you then reproduce the numbers?

edit.

4 Likes

yes, the answer is always the jacobian :)
The difference is not because the lognormal, but the <lower = 0>. If I remove it, I get the same results.

5 Likes

The variable lp__ starts at 0.

The parameters block increments it with the log absolute Jacobian determinant of the transform for any constrained variables.

The model block increments lp_ with the value of the right-hand side of any target += statement.

The model block increments lp_ with each sampling statement, adding the log density, but dropping constants that do not depend on parameters. If you’d rather keep the proportionality constants, use target +=.

That’s it. None of the other blocks touch it.

Its final value is the log density on the unconstrained parameter scale up to a constant that doesn’t depend on the parameters (and stays fixed iteration to iteration).

The value of lp__ at any given point is what you get from target() if you want to track it by printing. For example, you can print at the top of the model to see the starting point being the change-of-variables adjustment. Or compare the effect of ~ vs. target +=.

3 Likes

@Bob_Carpenter great explanation. I think this information is spread over various documents, so there’s no place it’s so concise like this. I’m not surprised that there are questions about it. I like @bnicenboim’s post title “How exactly is lp__ being calculated?” for an entry in a hypothetical FAQ or FAQ-like resource, so @martinmodrak may be interested.

2 Likes

Thanks, @Bob_Carpenter, that was really clear. Parameters with <lower=X> are back transformed to the constrained space with exp(auxiliary parameter) + X, right? How are the other parameters backtransformed? Is it written somewhere?
And another small thing, maybe it would be a good idea to return target rather than lp__ in the summary?

1 Like

Chapter 10 has all the transformations + Jacobian determinants: https://mc-stan.org/docs/2_24/reference-manual/variable-transforms-chapter.html

2 Likes

Thanks! This is new (and great!)

I agree that it’s weird to use target in the Stan program but then see lp__ in the output. Before target+= we had increment_log_prob(), which has a more transparent connection to the name lp__. Not replacing target with lp__ in the output may have just been to avoid breaking backwards compatibility, I forget. We deprecated increment_log_prob() but it still works. It’s harder to deprecate the name of a variable in the output.

Once upon a time, I did a few talks around this. Slides and examples here: https://syclik.com/2018/06/19/meetup-slides/

(unfortunately, my slides aren’t really super useful without context… I could probably record a video.)

I’ve brought it up before, but I’d really love to see our current lp__ broken into two parts:

  • the Jacobian term
  • the model… essentially the part that’s uses target +=

I think it would help with understanding the effect of the transformation (whether we’re ending up in bad numeric areas from the constraining operation) and tracking through problems with the model. Right now, if there’s an bug in the model, I need to trap it by print by placing print statements before and after the offending lines. If it were split out, I could run the likelihood in a separate language, like R, and get results to compare.

2 Likes