What are lp__, log_p__ and log_g__?

I am trying to undersatnd what are lp__, log_p__ and log_g__. lp__ appears in both HMC and VB results while log_p__ and log_g__ appear only in VB result.

After I read How exactly is lp__ being calculated , I understood it is the log joint density of the model likelihood and the prior.

\mathrm{log} P(X, \theta) = \mathrm{log} P(X | \theta) + \mathrm{log} P(\theta)

Then I also read this post Getting KL Divergence for Variational Bayes fit in RStan - #4 by dzl and understood that

  • log_p__ is also log joint density \mathrm{log}P(X, \theta)
  • log_g__ is log density of the variational posterior \mathrm{log}Q(\theta).

Now here are my questions.

  1. Is my understanding correct?
  2. In VB result csv file, why are there both lp__ and log_p__ if they are the same?
  3. I tried to calculate ELBO by taking mean of (log_p__ - log_g__) as the samples in the VB result csv are drawn from Q(\theta). However its result was different from the log file’s last part. What am I missing?

The ELBO output was taken from the last converged value of this part.

Begin stochastic gradient ascent.
  iter             ELBO   delta_ELBO_mean   delta_ELBO_med   notes
   100         -145.045             1.000            1.000
   200         -145.134             0.500            1.000
   300         -144.550             0.335            0.004   MEDIAN ELBO CONVERGED

Thank you.

1 Like
  1. That’s not quite right, but probably in line with what our docs say. Lots of people, including the VI devs, mistakenly assume that a Stan model writes down the joint log density \log p(y, \theta), but all that is required is that it’s equal to \log p(\theta \mid y) + \textrm{const}.. There’s a more nuanced discussion in the optimization/efficiency chapter of the user’s guide where I explain how you can exploit conjugacy and/or sufficient stats to accelerate models.

  2. I honestly don’t know why they’re both reported. Probably because whoever added log_p__ didn’t realize it was the same as lp__.

  3. The ELBO isn’t the diff of log_p__ and log_g__. You can find a lot of discussion online, but it’s essentially negative KL divergence from approximation to true posterior minus a normalizing constant. Given that involves an integral, we have to evaluate it with Monte Carlo draws from the approximating distribution.

ADVI did write lp__ column to csv, but wrote all zeroes (unlike HMC), so it’s not the same. This issue has the discussion leading to the decision. I first proposed that we would write the desired values to lp__, but after discussion of backwards compatibility it was decided to add log_p__ and log_g__. The relevant PR. The discussion also assume there would be I/O refactoring coming.

As the draws are from g, this is in principle the right approach. The results can differ already, as the default number of draws used every 100th iteration is to compute ELBO is much less than the default number of draws in the csv. It is also quite possible that there is difference whether constant normalization terms are included as they are not needed for importance sampling and Pareto-k diagnostic. The PR states

compute log_p__ (unnormalized log density of the exact model) and log_g__ (unnormalized log density under either the meanfield or fullrank gaussian approximation). The latter one is obtained from normal-reparametrization and drops all constants.

I don’t remember whether ELBO shown during the optimization uses constant for gaussian approximation.