Log_lik for LOO vs lp__

Here is what may sound like a stupid question (even to me), but I am trying to double check my understanding.

If I have defined a variable called log_lik for the purposes of LOO computation, in the generated quantities block, my expectation was that if I summed up the values of log_lik over all the observations, I would get the same value as lp__, which is returned directly from Stan.

As a quick check, I just printed out the stanfit object, getting means, SE, percentiles etc. for all the parameters, including each observation-specific value of log_lik and lp__, and compared the sum of the means of the log_lik values to the mean lp__ value. I expected that they would be very close.

In some cases, that was true. But in others, it was not true. My bottom-line question is: should I use that as an indicator that I have coded something wrong (probably in the generated quantities block, though I essentially just copied the calculations leading to the “target += …” statement)? Or is there another reason I could be seeing differences in the quick check that I just described?

Thanks in advance for assistance on this problem,
Bruce

1 Like

No. The lp__ includes contributions from the priors and from the Jacobians of the transformations from the constrained space to the unconstrained space used by NUTS. The log-likelihood terms should not include either of those.

1 Like

And ~ notation also drops constants which are not dropped when using _lpmf/_lpdf’s

1 Like

I had asked about the relationship between the log-lik values computed (for the purpose of LOO computation, for example) and the Stan-generated lp__. Dr. Goodrich pointed out that the lp__ values take into consideration the priors (which I should have realized straight off) and the transformations that Stan does of constrained variables to their unconstrained correspondents (which I had not picked up on). Dr. Vehtari added the comment that constant terms are not included with the “~” notation, but are when “_lpdf” is used. I understand that, and having played around with different formulations, I know exactly how that affects log_lik type computations. Thank you.

What I am really interested in, and why I was asking the questions in the first place, is that I am trying to figure out how to deal with PSIS-LOO computations where some of the k parameters are bad.

As a simple example (and I would be happy to send the data or summary outputs if someone wants to pursue this), I have a data set for which I am starting with a simple linear regression (even simpler in the sense that it has zero intercept), where I am regressing summary data (means and variances) for a response, on the explanatory variable, x; the summary data (observations) come from different publications, for which we only have the summaries, not the individual values. The slope parameter, b, has a normal (0, s) prior and for sigma I have the prior set to half-Cauchy (0, scale). Truly basic. But for one of the 85 observations, the Pareto k diagnostic is in the “very bad” range (>1). The only interesting thing I can tell you about that observation is that it has a much bigger variance than the other observations, and the Stan-fit SD estimate for its log_lik contribution is more than an order of magnitude greater than the contributions from the other observations.

So, I am wondering if this is something, in general, that will be problematic for the PSIS-LOO calculations? Or what is it that I should be aware of, and may need to address, when I am computing PSIS-LOO? I was not previously familiar with PSIS smoothing (or importance sampling in any detail, honestly), so I am trying to understand the basics of the diagnostics and some root causes or known problems.

I have read the papers cited in the LOO package documentation, so I have at least that much familiarity. If I recall correctly, one thing you suggest when diagnostics are bad is that one might want to try a more robust model. It occurs to me that assuming a constant sigma for these data, in the presence of this one observation that demonstrated a much greater degree of variability, might fall under the situation of a “not-robust-enough” model. Does that make sense to you; is that the kind of thinking you were alluding to in the discussion about problematic cases?

Ultimately, I want to apply some more complex models and do a hierarchical analysis. That is why I am very keen to get LOOic values for comparison. If you are interested and willing to assist, I can share more details about that plan, too. But for now, any insight you can offer at this basic level would be gratefully appreciated.

Thanks in advance,
Bruce

I’m not certain I yet understand the model. Do you have one linear model for means and one linear model for variances? Or one linear model? I have guesses for both cases, but I prefer to understand the model first.

Yes, in re-reading my post, it is unclear. I meant to say that I have modeled the mean response with a linear model. No regression on the variances; I am assuming (at this point, for investigation) a constant variance for the underlying individual responses

Constant variance…

…but one has a much bigger variance? I don’t understand.

Could you post your code?