Can someone please confirm which of the following is correct about how Stan works.

Does Stan use the derivatives of p(\theta) (the prior for \theta) to generate transitions or does it use derivatives of p(\theta \mid y) (the posterior for \theta).

Can someone please confirm which of the following is correct about how Stan works.

Does Stan use the derivatives of p(\theta) (the prior for \theta) to generate transitions or does it use derivatives of p(\theta \mid y) (the posterior for \theta).

To run HMC, you need to compute the gradient of the log posterior. Now

\begin{eqnarray*}
\nabla_\theta \log p(\theta \mid y) & = &\nabla_\theta [ \log p(y \mid \theta) + \log p(\theta) - \log p(y)] \\
& = & \nabla_\theta [ \log p(y \mid \theta) + \log p(\theta)].
\end{eqnarray*}

So the desired gradient is obtained by differentiating the log likelihood and the log prior.

2 Likes