Ah! Yes, thanks for pinging. I ran a couple experiments recently too. Summoning @avehtari as well.
Yeah, that’s my experience too. The fixed point iterations are rough. Making them more stable is probably gonna be pretty problem/metric dependent.
I think there are mechanisms to make things like fixed point methods work better https://epubs.siam.org/doi/10.1137/10078356X. Given the name is acceleration, I do not know if it does much for stability of the solve. I haven’t tried it. If you guys get the time and inclination I’d be curious if it does anything.
I was reading some stuff about these implicit solves (Hairer). I think when the solve fails (cause of a bad initial guess or questionable method), the book called it a divergence. The fun thing here that I think @betanalpha said is that the implications of one of these solves failing is different from the sort of thing we call a divergence. The things we call divergences can happen even when the solves are going well. So there are divergences that are messing up our divergences? Hooray!
Anyway I ended up changing the metric around a bit so I could do Newton solves easily. I’m surprised both that you figured out how to do Newtons with softabs and that they didn’t help. How’d you do the Newtons?
There are also these line search and trust point method things that are used to stabilize these sorts of solves. My buddy @arya did some stuff with implicit integrators in HMC and these were important for him.
In these solves, you don’t try to do the full implicit hop. You break things down into smaller hops. It’s usually pretty easy to tell when things are blowing up so you can detect and keep decreasing your hop size.
You’re still solving the same eqs in the end, it just takes longer. The issue with these is even allowing super super super tiny steps (2^{-10} \epsilon), sometimes things still just explode anyway.
I want to swap a few around and see what they do. The difficulty for me now is it’s been hard to understand the nuts and bolts of what the integrator is doing when it’s wrapped up in Stan’s stuff. @arya wrote a package in R for doing HMC/NUTS outside of Stan that he used to investigate the behavior of implicit midpoint that should be useful here.
There might be some retooling to do – I dunno. I need to look into this. I’ll keep you (and @henine) posted. Of course if you want to dig into it yourself, hit up @arya directly I’m sure he’ll point you to where it’s at.
The book on symplectic integrators is https://www.cambridge.org/us/academic/subjects/mathematics/computational-science/simulating-hamiltonian-dynamics?format=HB . It’s really accessible and concise. You should definitely grab that.
Back to the metric, the thing I’ve been trying to work with is a metric where you say the metric is oriented in a fixed direction and the scale is changing according to the curvature of the Hessian of the log density in each direction. So kinda writing it out like an eigendecomposition (for a 2x2):
M^{-1}(q) =
[v_1, v_2]
\begin{bmatrix}
-v_1^T \nabla^2 L(q) v_1 & 0 \\
0 & -v_2^T \nabla^2 L(q) v_2
\end{bmatrix}
[v_1, v_2]^T
There’s no math going in to justify that. I’m just enforcing it. In the odd case that the posterior did have an eigendecomposition where the directions of changing curvature were fixed it’d be okay as long as you can identify those directions (I do this in adaptation by computing the actual Hessian at a random point).
Anyway, you can compute the gradients of this pretty easily cause the [v_1, v_2] matrix is constant, and the terms like this v_1^T\nabla^2 L(q) v_1 can be computed with two fvars or a finite difference, so you can get the gradients with a reverse mode. You hit each of those with a softabs first to keep em’ pointed in the right direction though.
For P parameters, it takes P reverse modes, so it’s as expensive as the softabs metric. The cost is you don’t change directions, the advantage is you can work with the terms that involve derivatives of the metric very easily and do things like Newton solves.
Anyway, not sure how useful that assumption is or not. Certainly in the current state of things, I can get it to run centered eight-schools and all but the results are not encouraging in terms of Neff/sample, so I probably have buggy wuggies somewhere. There are some others I’d like to try, but the integrator has been such a hassle that I want to break it out and examine it more closely.
I’m not even sure there isn’t a bug in my current implementation of the integrator. Just need to look at these things step by step and think and maybe switch solve types and such.
I’m also glad someone else hit the same issues with softabs lol. Makes me feel better about not missing something, though I 'spose we both could be.
I can try to push my code up to a repo later if you guys want to see but it’s kindof a mess.