(please read the next few lines under the premise that I’m no expert)
HMC jumps much farther in each iteration compared to other conventional sampling procedures. (See this or this for some visual intuition why this is the case.) That takes a little more time to compute each step, but autocorrelation between samples is much much lower, thus HMC is overall very efficient. This also implies that thinning is almost always a bad idea with HMC, because you throw away a lot of information. I found that the default of 2000 iterations with 1000 warmups and no thinning is fine for most models. If there are problems it probably has more to do with the model. That being said, it is always a good idea to sample using multiple chains with diffuse starting points.
From my experience almost non of the problems when sampling using HMC will be solved by running excessively long chains and thinning.
But please keep in mind that I’m no expert. :)
edit: I just saw that your model looks like it was “translated” from JAGS (looking at the priors). Usually priors like
inv_gamma(.001, .001) and
normal(0, 1/.001) are discouraged in Stan. If you’d prior predictive checks (run your model without data), then you would see how ridiculous the implications of such broad priors are. (That’s something I was stunned by when I started doing it.) Also, I don’t see where you actually have the “hierarchical” part in your model. It looks like you estimating different slopes for each group, which all come from similar priors, but you don’t really have a “hierarchical” parameter in there. (I have to say, that I didn’t run your code though.)
This post was presumably edited from something with content down to something that just says 0. It’s helpful for everyone else to leave the questions in place after they get answered. Otherwise, replies like this one are left hanging.