Previously I have had a problem of improper termination in stan (link) and I still did not solve it. To make a more thorough check, I am wondering if I could see more detailed information on the stan execution, in particular every leap frog step and gradient. Leapfrog could be seen by adding print statement in the model, but I did not figure out how to see the gradient.
Do you mean printing each position in parameter space to see what the actual step was? If so, by the same token their likelihood difference would be an approximation of the gradient, since it’s the direction the sampler will flow on the parameter surface. Otherwise you’d have to have access to the actual gradient and leapfrog integration calculations, which I don’t think are directly accessible and would probably be too costly to reproduce within the model.
The update is the gradient times discretization time \epsilon, so it seems this could only give the direction of gradient, not precisely the gradient.
For your particular use case, you might be able to see exactly what you need by setting the max treedepth to 1 (and requesting a very large number of iterations), and then using save_latent_dynamics = TRUE in a call to cmdstanr::sample to see the gradients. Of course setting the max treedepth to 1 would yield different (and far less efficient) dynamics, but if the sampler eventually encounters a problematic region you would have the position and gradient right when it happens.
Parenthetically, the update is not the gradient times the discretization time. The gradient gives the time derivative of momentum, not the time derivative of position.
The leapfrog-approximated gradient is, equivalently, the derivative of the potential with respect to position (i.e. derivatives of posterior with respect to each parameter), which is what can actually be computed from the posterior.
I don’t known that there’s a way of directly accessing the auto-diff computed gradient, but with the step size and parameter values you could retrieve the actual component of the exploration dynamics. If what you suspect is some error in the gradient, this should give you just that.
Thanks! I found that interestingly, some chains did not crash into extreme problematic regions after I set max_treedepth = 1, though it went to problematic regions when I set default max_treedepth. Would this imply some possible reasons why my code crashed?
By the way, it seems the code
model$sample(data=list(N=100,y=y), chains = 4,max_treedepth =1, save_latent_dynamics =TRUE, output_dir= getwd() )
gives 2 csv files for each chain, each containing 1000 samples of the parameters. So it seems only each iteration is saved, not each leapfrog?
Yes, that’s right this saves each iteration, not each leapfrog. That’s why I suggested setting max treedepth to 1, so you’d actually get each leapfrog.
I’ve seen something like this before, and I have a guess (but it’s just a guess) about the behavior. My guess is that when the likelihood surface, far from the typical set, has a steep hill and then a large flat region, the sampler’s initial exploration can send it hurtling down the hill and then out a long way onto the flat region, sort of like riding a sled down a hill and then across a flat field. If the problematic region resides way out on that flat region somewhere, then a large treedepth allows the chain to coast across the flat region far enough to reach the problematic area, whereas a small max treedepth causes the momentum to get resampled before the exploration has a chance to coast too far out onto the flat region.
More discussion of this (hypothetical) phenomenon here:
Thanks! I will read the link.