tl;dr

what is the result of `stan::model::gradient(m, zeta, tmp_lp, tmp_mu_grad, &ss);`

?

\zeta value plugged in to \nabla_{\theta} \log p(\mathbf{X}, \theta) OR \nabla_{\zeta} \log g(\mathbf{X}, \zeta)?

–

It seems this ADVI code,

`stan::model::gradient(m, zeta, tmp_lp, tmp_mu_grad, &ss);`

is calculating formula (equation 5, below) from the paper

\nabla_{\theta} \log p(\mathbf{X}, \theta) \nabla_{\zeta} T^{-1}(\zeta)+\nabla_{\zeta} \log \left|\operatorname{det} J_{T^{-1}}(\zeta)\right|

= \nabla_{\zeta} \log g(\mathbf{X}, \zeta)

( g(\mathbf{X}, \zeta) = p\left(\mathbf{X}, T^{-1}(\zeta)\right)\left|\operatorname{det} J_{T^{-1}}(\zeta)\right|)

as `mu_grad`

equals `tmp_mu_grad`

here.

According to stan::model::gradient (f, x, fx, grad_fx) documentation, it calculates the value and the gradient of the specified function at the specified argument.

So, from the above, gradient of `tmp_lp`

calculated from the model `m`

would return \zeta value plugged in to \nabla_{\theta} \log p(\mathbf{X}, \theta), not \nabla_{\zeta} \log g(\mathbf{X}, \zeta).

From here, I assumed that `m`

would return lp function of original parameters (\theta, `cont_params_`

) not \zeta.