Stan gradient w.r.t ? in advi code

hyunji.moon · December 24, 2020, 12:06pm

tl;dr
what is the result of stan::model::gradient(m, zeta, tmp_lp, tmp_mu_grad, &ss);?
\zeta value plugged in to \nabla_{\theta} \log p(\mathbf{X}, \theta) OR \nabla_{\zeta} \log g(\mathbf{X}, \zeta)?

–
It seems this ADVI code,
stan::model::gradient(m, zeta, tmp_lp, tmp_mu_grad, &ss);
is calculating formula (equation 5, below) from the paper
\nabla_{\theta} \log p(\mathbf{X}, \theta) \nabla_{\zeta} T^{-1}(\zeta)+\nabla_{\zeta} \log \left|\operatorname{det} J_{T^{-1}}(\zeta)\right|
= \nabla_{\zeta} \log g(\mathbf{X}, \zeta)
( g(\mathbf{X}, \zeta) = p\left(\mathbf{X}, T^{-1}(\zeta)\right)\left|\operatorname{det} J_{T^{-1}}(\zeta)\right|)

as mu_grad equals tmp_mu_grad here.

According to stan::model::gradient (f, x, fx, grad_fx) documentation, it calculates the value and the gradient of the specified function at the specified argument.

So, from the above, gradient of tmp_lp calculated from the model m would return \zeta value plugged in to \nabla_{\theta} \log p(\mathbf{X}, \theta), not \nabla_{\zeta} \log g(\mathbf{X}, \zeta).

From here, I assumed that m would return lp function of original parameters (\theta, cont_params_) not \zeta.

hyunji.moon · December 24, 2020, 2:35pm

This stan manual says lp__ is not just p(\theta, x) but evaluation of the model on the unconstrained scale.

So is it a correct understanding that lp__ returned by models in stan c++ are always the function of unconstrained parameters and therefore \nabla_{\zeta} \log g(\mathbf{X}, \zeta) is the correct answer?

bbbales2 · December 24, 2020, 7:52pm

I think the gradients here will be the log density with respect to the unconstrained space. So that’s \nabla_{\zeta} p(x | \theta)p(\theta) |J_{T^{-1}}(\theta)| where \theta = T^{-1}(\zeta)

I think in this notation, \theta are the model parameters, \zeta are the unconstrained parameters, and \phi are the parameters of the variational distribution, q(\zeta | \phi).

Topic		Replies	Views
Automatic differentiation with stan math Modeling	12	4596	November 24, 2017
Integrate_1d function has nan gradients Modeling fitting-issues	2	672	October 7, 2019
Exposing the gradient of the log-likelihood General	12	874	October 28, 2020
Using Stan to compute the gradient of a function General	15	669	June 1, 2024
Computing the gradient of another gradient General	3	413	January 31, 2020

Stan gradient w.r.t ? in advi code

Related topics