I am trying to modify some source code in ADVI.
I know I can use
template <bool propto, bool jacobian_adjust_transforms, typename T> T log_prob
to calculate log density. I understand I can let the first option propto=false so as to drop all constants.
But when I compare the result of log_prob = model_.template log_prob<false, true>
and log_prob_2 = model_.template log_prob<true, true>
I find the difference between these two is not a constant. The former one is the same as what I will get if I manually calculate the log density plus jacobian.
For instance, I am playing with the comparison in this line:
Do I misunderstand something on the propto option?
Yes, I understand that. So I would expect model_.template log_prob<false, true> - model_.template log_prob<true, true> = constant
for posterior sample. But I find it not true at least in the ADVI code.
ADVI does not drop constant for their selling point is to interpret the average log density as a lower bound as marginal likelihood. model_.template log_prob<false, true> does return that value correctly. I check this by manually compute the normalized log density.
Since neither the computation nor the diagnostics need that constant, I was trying to modify the code using model_.template log_prob<true, true> for it should save some computation. It will also help avoid numeric issues when data size is large.
My question is I find the value of model_.template log_prob<true, true> - model_.template log_prob<false, true> not a constant for posterior sample. Conceptually it is the value of those normalizing constant such as nlog2pi in Gaussian density, so it should keep constant for all draws in one posterior sample.
Had a look into this (https://github.com/stan-dev/math/issues/1020#issuecomment-418511094). It’s a design decision. The propto template parameter requires autodiff template arguments to inform it what the proportionality constant is, so instantiating the lpdfs/lpmfs with only doubles might seem to act strangely (but this is the correct behavior).
That’s right. The C++ is efficient as it is because we are able to drop constants with respect to constants in the expressions. We treat C++ primitives as constants and stan::math::var as variables which we need to propagate the chain rule through. So by design, all probability distributions return 0.0 (log 1) if everything is constant. We assume that between iterations, those values will never change.