Log_lik for LOO vs lp__


Here is what may sound like a stupid question (even to me), but I am trying to double check my understanding.

If I have defined a variable called log_lik for the purposes of LOO computation, in the generated quantities block, my expectation was that if I summed up the values of log_lik over all the observations, I would get the same value as lp__, which is returned directly from Stan.

As a quick check, I just printed out the stanfit object, getting means, SE, percentiles etc. for all the parameters, including each observation-specific value of log_lik and lp__, and compared the sum of the means of the log_lik values to the mean lp__ value. I expected that they would be very close.

In some cases, that was true. But in others, it was not true. My bottom-line question is: should I use that as an indicator that I have coded something wrong (probably in the generated quantities block, though I essentially just copied the calculations leading to the “target += …” statement)? Or is there another reason I could be seeing differences in the quick check that I just described?

Thanks in advance for assistance on this problem,


No. The lp__ includes contributions from the priors and from the Jacobians of the transformations from the constrained space to the unconstrained space used by NUTS. The log-likelihood terms should not include either of those.


And ~ notation also drops constants which are not dropped when using _lpmf/_lpdf’s


I had asked about the relationship between the log-lik values computed (for the purpose of LOO computation, for example) and the Stan-generated lp__. Dr. Goodrich pointed out that the lp__ values take into consideration the priors (which I should have realized straight off) and the transformations that Stan does of constrained variables to their unconstrained correspondents (which I had not picked up on). Dr. Vehtari added the comment that constant terms are not included with the “~” notation, but are when “_lpdf” is used. I understand that, and having played around with different formulations, I know exactly how that affects log_lik type computations. Thank you.

What I am really interested in, and why I was asking the questions in the first place, is that I am trying to figure out how to deal with PSIS-LOO computations where some of the k parameters are bad.

As a simple example (and I would be happy to send the data or summary outputs if someone wants to pursue this), I have a data set for which I am starting with a simple linear regression (even simpler in the sense that it has zero intercept), where I am regressing summary data (means and variances) for a response, on the explanatory variable, x; the summary data (observations) come from different publications, for which we only have the summaries, not the individual values. The slope parameter, b, has a normal (0, s) prior and for sigma I have the prior set to half-Cauchy (0, scale). Truly basic. But for one of the 85 observations, the Pareto k diagnostic is in the “very bad” range (>1). The only interesting thing I can tell you about that observation is that it has a much bigger variance than the other observations, and the Stan-fit SD estimate for its log_lik contribution is more than an order of magnitude greater than the contributions from the other observations.

So, I am wondering if this is something, in general, that will be problematic for the PSIS-LOO calculations? Or what is it that I should be aware of, and may need to address, when I am computing PSIS-LOO? I was not previously familiar with PSIS smoothing (or importance sampling in any detail, honestly), so I am trying to understand the basics of the diagnostics and some root causes or known problems.

I have read the papers cited in the LOO package documentation, so I have at least that much familiarity. If I recall correctly, one thing you suggest when diagnostics are bad is that one might want to try a more robust model. It occurs to me that assuming a constant sigma for these data, in the presence of this one observation that demonstrated a much greater degree of variability, might fall under the situation of a “not-robust-enough” model. Does that make sense to you; is that the kind of thinking you were alluding to in the discussion about problematic cases?

Ultimately, I want to apply some more complex models and do a hierarchical analysis. That is why I am very keen to get LOOic values for comparison. If you are interested and willing to assist, I can share more details about that plan, too. But for now, any insight you can offer at this basic level would be gratefully appreciated.

Thanks in advance,


I’m not certain I yet understand the model. Do you have one linear model for means and one linear model for variances? Or one linear model? I have guesses for both cases, but I prefer to understand the model first.


Yes, in re-reading my post, it is unclear. I meant to say that I have modeled the mean response with a linear model. No regression on the variances; I am assuming (at this point, for investigation) a constant variance for the underlying individual responses


Constant variance…

…but one has a much bigger variance? I don’t understand.

Could you post your code?