tl;dr
what is the result of stan::model::gradient(m, zeta, tmp_lp, tmp_mu_grad, &ss);
?
\zeta value plugged in to \nabla_{\theta} \log p(\mathbf{X}, \theta) OR \nabla_{\zeta} \log g(\mathbf{X}, \zeta)?
–
It seems this ADVI code,
stan::model::gradient(m, zeta, tmp_lp, tmp_mu_grad, &ss);
is calculating formula (equation 5, below) from the paper
\nabla_{\theta} \log p(\mathbf{X}, \theta) \nabla_{\zeta} T^{-1}(\zeta)+\nabla_{\zeta} \log \left|\operatorname{det} J_{T^{-1}}(\zeta)\right|
= \nabla_{\zeta} \log g(\mathbf{X}, \zeta)
( g(\mathbf{X}, \zeta) = p\left(\mathbf{X}, T^{-1}(\zeta)\right)\left|\operatorname{det} J_{T^{-1}}(\zeta)\right|)
as mu_grad
equals tmp_mu_grad
here.
According to stan::model::gradient (f, x, fx, grad_fx) documentation, it calculates the value and the gradient of the specified function at the specified argument.
So, from the above, gradient of tmp_lp
calculated from the model m
would return \zeta value plugged in to \nabla_{\theta} \log p(\mathbf{X}, \theta), not \nabla_{\zeta} \log g(\mathbf{X}, \zeta).
From here, I assumed that m
would return lp function of original parameters (\theta, cont_params_
) not \zeta.