I recently dived into the STAN and there is a small detail in the manual I just do not understand:
The HMC sampler works in an unconstrained space. To do that, all constrained parameter are transformed into an unconstrained space and the corresponding Jacobian is added to P(\theta, y). Now, during that process some parameter may be transformed in a lower dimensional space (i.e. one has a Dirichlet prior in \theta), hence the space where HMC operates might be smaller than the original one.
Now within the HMC sampler, during one leapfrog step, one has to inverse transform the current sample to the original space in order to calculate the log-probability and get gradients of P(\theta, y) + adjustments. The dimension of the gradient vector should be equal to the dimension of the constrained space, not the unconstrained space of \theta.
Question:: As far as I understand, you get gradient information of all parameter in the original space, but sample (and update momentum) in the unconstrained space. There should be a miss-match in dimensionality between the gradients of the original parameter and the dimension where HMC operates?
For instance, \theta_i might be Dirichlet distributed with K elements, then I can transform \theta_i into a K-1 unconstrained space, but I would get gradient information for all K elements. Hence, to update the momentum variable, which is defined in the unconstrained k-1 space, I would only need K-1, and not K gradient elements?