How exactly is lp__ being calculated

bnicenboim · July 31, 2020, 12:23pm

Sorry if this is not the right category.
I want to understand how lp__ is being calculated for a very simple model, say I have a model like this:

data {
  int<lower = 1> N;  
  vector[N] y;  
}
parameters {
  real mu;
  real<lower = 0> sigma;
}
model {
  target += normal_lpdf(mu | 0, 20); 
  target += lognormal_lpdf(sigma | 3, 1); 
  target += normal_lpdf(y | mu, sigma); 
}

with data:

Y <- rnorm(n = 100, mean = 3, sd = 10)

I would assume I can get the same value of lp__ for a given sample by doing the following in R:

mu <- extract(fit_score)$mu
sigma <- extract(fit_score)$sigma
addmu <- dnorm(x = mu[1], mean = 0, sd = 20, log = TRUE)
addsigma <- dlnorm(x = sigma[1], mean = 3, sd = 1, log = TRUE)
addY <- sum(dnorm(Y,  mu[1], sigma[1], log = TRUE))
lp <- addmu + addsigma + addY

But I don’t, I get similar values -371 by hand and -369 with extract(fit_score)$lp__[1]. Is this because of differences between dnorm/dlnorm and stan lpdf? Or is there something else that is added or not added in Stan in comparison with my R code?

ahartikainen · July 31, 2020, 12:25pm

(I don’t know; Jacobian?)

If you change lognormal to normal, can you then reproduce the numbers?

edit.

bnicenboim · July 31, 2020, 12:43pm

yes, the answer is always the jacobian :)
The difference is not because the lognormal, but the <lower = 0>. If I remove it, I get the same results.

Bob_Carpenter · August 5, 2020, 12:07am

The variable lp__ starts at 0.

The parameters block increments it with the log absolute Jacobian determinant of the transform for any constrained variables.

The model block increments lp_ with the value of the right-hand side of any target += statement.

The model block increments lp_ with each sampling statement, adding the log density, but dropping constants that do not depend on parameters. If you’d rather keep the proportionality constants, use target +=.

That’s it. None of the other blocks touch it.

Its final value is the log density on the unconstrained parameter scale up to a constant that doesn’t depend on the parameters (and stays fixed iteration to iteration).

The value of lp__ at any given point is what you get from target() if you want to track it by printing. For example, you can print at the top of the model to see the starting point being the change-of-variables adjustment. Or compare the effect of ~ vs. target +=.

jonah · August 5, 2020, 12:25am

Bob_Carpenter:

The variable lp__ starts at 0.

The parameters block increments it with the log absolute Jacobian determinant of the transform for any constrained variables.

The model block increments lp_ with the value of the right-hand side of any target += statement.

The model block increments lp_ with each sampling statement, adding the log density, but dropping constants that do not depend on parameters. If you’d rather keep the proportionality constants, use target += .

That’s it. None of the other blocks touch it.

Its final value is the log density on the unconstrained parameter scale up to a constant that doesn’t depend on the parameters (and stays fixed iteration to iteration).

The value of lp__ at any given point is what you get from target() if you want to track it by printing. For example, you can print at the top of the model to see the starting point being the change-of-variables adjustment. Or compare the effect of ~ vs. target += .

@Bob_Carpenter great explanation. I think this information is spread over various documents, so there’s no place it’s so concise like this. I’m not surprised that there are questions about it. I like @bnicenboim’s post title “How exactly is lp__ being calculated?” for an entry in a hypothetical FAQ or FAQ-like resource, so @martinmodrak may be interested.

bnicenboim · August 5, 2020, 7:17am

Thanks, @Bob_Carpenter, that was really clear. Parameters with <lower=X> are back transformed to the constrained space with exp(auxiliary parameter) + X, right? How are the other parameters backtransformed? Is it written somewhere?
And another small thing, maybe it would be a good idea to return target rather than lp__ in the summary?

hhau · August 5, 2020, 7:38am

Chapter 10 has all the transformations + Jacobian determinants: 10 Constraint Transforms | Stan Reference Manual

bnicenboim · August 5, 2020, 7:45am

Thanks! This is new (and great!)

jonah · August 5, 2020, 3:38pm

I agree that it’s weird to use target in the Stan program but then see lp__ in the output. Before target+= we had increment_log_prob(), which has a more transparent connection to the name lp__. Not replacing target with lp__ in the output may have just been to avoid breaking backwards compatibility, I forget. We deprecated increment_log_prob() but it still works. It’s harder to deprecate the name of a variable in the output.

syclik · August 5, 2020, 4:43pm

Once upon a time, I did a few talks around this. Slides and examples here: Slides from "How Stan computes the posterior distribution" · syclik

(unfortunately, my slides aren’t really super useful without context… I could probably record a video.)

I’ve brought it up before, but I’d really love to see our current lp__ broken into two parts:

the Jacobian term
the model… essentially the part that’s uses target +=

I think it would help with understanding the effect of the transformation (whether we’re ending up in bad numeric areas from the constraining operation) and tracking through problems with the model. Right now, if there’s an bug in the model, I need to trap it by print by placing print statements before and after the offending lines. If it were split out, I could run the likelihood in a separate language, like R, and get results to compare.

Bob_Carpenter · August 12, 2020, 9:15pm

It’s actually been in the manual since version 1. I think the reference manual is probably getting lost in the shuffle these days after being split out into a separate volume.

And back in the days when we used <- for assignment, we had

lp__ <- lp__ + alpha;

Removing lp__ as a user-modifiable variable was the only backwards-compatibility breaking change we’ve made to the language.

I got tired of suggesting this because I was the only one speaking up for it. Specifically, I wanted to be able to have

transformed parameters {
   real alpha = exp(alpha_unc);

   jacobian += exp(alpha);
}

so that we could define Jacobians for transformed parameters right in the transformed parameters block. Then lp__ would be a combianation of the constrained model log density plus the log Jacobian determinant of the constraining transform.

Without this, we can’t actually implement Stan transformation in Stan. In uses for optimization, we turn the Jacobian off. There’s now way to write a Jacobian that gets turned off for optimization in the Stan language as it stands.

But, when I implemented the first version of the Stan language, I forgot to allow lp__ modifying statements in transformed parameters. Then everyone got fixated on all lp__ modification happening in the model block. That’s also why we don’t allow sampling statements in the parameters block for priors.

But I still like it, and if I were king of Stan, we’d have the above syntax.

jtimonen · August 12, 2020, 9:31pm

So, to make sure, if I have declared a parameter real<lower=0> theta and want its square to have a positive-truncated standard normal prior, is it correct to have

target += normal_lpdf(square(theta) | 0, 1) + log(fabs(2*x));

in the model block? So I don’t have to account for the <lower=0> truncation myself, only the squaring? And if I instead had a transformed parameter real theta2 = square(theta), then I would not need + log(fabs(2*x))?

syclik · August 12, 2020, 9:55pm

Either I missed that or, more likely, I didn’t understand the impact until I made those slides. So… now I’m speaking up with you!

jonah · August 12, 2020, 10:19pm

I remember the idea being floated a while ago but I don’t remember what the reasons put forward against it were. To me splitting it up like this seems like a nice idea.

Maybe it’s worth bringing up again. There could be more support for it now.

Bob_Carpenter · August 12, 2020, 11:24pm

Right. We apply the Jacobian for the lower bound so it’s (improper) uniform on positive values. Same for lower and upper bounds, where it’s uniform in that range.

Topic		Replies	Views
Stan meetup tonight in NYC: How Stan computes the posterior distribution Publicity	3	689	June 20, 2018
Lp__ and target += Modeling rstan	5	522	July 4, 2022
Pystan log-probability confusing! Modeling	24	1788	August 25, 2020
Constrained parameters are transformed, which affects the calculation of the log posterior probability lp__ Modeling	2	424	May 19, 2020
Yet another lp__ question General	7	708	April 11, 2018

How exactly is lp__ being calculated

Related topics