# All masses below 1 and max treedepth being hit

I am running into an issue as I am scaling up a model on simulated data. Basically as I go from 4,000 to 8,000 parameters the treedepth goes from 9 to max of 14, and step size of 0.03 to 0.06. In the smaller case, the masses seem to be centered around 1, but in the larger model, they are all less than 1. My understanding was that if all the masses were small, HMC would just pick a smaller step size - but that does not seem to be happening.

What is going on here? Are there HMC parameters that I should I change to allow my model to scale?

Here are the two masses matrices, plotted on the log scale.

In general, we’ve had a lot of problems fitting non-diagonal mass matrices if that’s what you’re talking about here. I dind’t understand the axis labels in the plots.

The step size is going to depend heavily on the target threshold rate and how much curvature there is in the model. Sometimes more data can make models a lot easier to fit. For instance, if you have a centered hierarchical model, more data makes the model more like an isotropic normal, whereas if you have a non-centered parameterization then the exact opposite happens.

If the data’s the same, then yes, the mass matrix (inverse covariance matrix) and step size will be correlated across chains.

Thanks. This is a diagonal mass matrix. The y axes is log of the inverse mass matrix, x axis is parameter index, each color is a different vector in the parameters block.

I think there may have been a bug in my model that only showed up with more data - so it could just be the ‘folk theorem’ at work.

Formally, for a leapfrog integrator the scale of the elements of the inverse metric (i.e. inverse mass matrix) are not identified relative to the step size. If we set the inverse metric to the posterior covariance (on the unconstrained space), however, then that sets an absolute scale that identifies both. This is the approach our adaptation takes.

In particular, the elements of the inverse metric can be arbitrarily small if your posterior variances are arbitrarily small.