Today I got a traceplot looking like this (only a few parameters are shown):
Some parameters don’t move.
Some parameters look jumpy (they don’t move in every iteration), and they move so little that the y-axis labeling doesn’t have enought digits to show the difference.
Which parameter does what varies between model runs
Sometimes there are parameters which look more ok (they move in every iteration).
What I think is happening:
Since some parameters do move, some iterations are obviously non-divergent. Therefore the non-movement of the other parameters can’t be caused by divergences.
This model forces Stan to decrease stepsize massively during the first parts of the adaptation (checked: stepsize < 2^-50). I suspect this leads to a point where some parameters jump so little that they run into a numerical problem, leading to jumpsize zero. This might result in a zero-variance estimate for this parameter which could then be used to base the mass-matrix on (I didn’t check the mass matrix). Consequently these parameters can never recover, and continue to have zero-length jumps even as the stepsize increases during later parts of the adaptation. (stepsize was still very low at the end of the adaptation, though, so all the parameters, even if they do move, move very little, longer warmup would’ve been required)
Question 1: Does this interpretation seem good?
Question 2: If so, Stan might benefit from some mechanism that disallows zero-variance mass matrices or prevents this from happening in some other way. This might allow it to recover from such a situation, given long enougth warmup. (The point of this post was to bring up this suggestion)
Edit: In practice a recover from such a situation might take too long, even if possible. This might improve with the campfire warmup routine.
Edit2: Feel free to strike this down If you think it’s not relevant in practice. It was more of a spontaneous thought than a well reasoned suggestion.