Parameter scaling and hitting the maximum tree depth

Dear Stan Community,

I’m having some trouble fitting a large model. The model has many moving parts, so I will try to give the jist of it without having to copy all of it here.

In essence, I’m trying to model several dynamical processes on longitudinal data in a hierarchical model. These dynamical processes are weighted by four weights that sum to 1.

Two of the terms in my model are modeled as a scaled fraction of some baseline value called mu_pr:

pow_a_frac[1] = 5.0*Phi_approx(pow_a_pr[1]).*mu_pr[2];
lrn_intrcpt[1] = 5.0*Phi_approx(lrn_intrcpt_pr[1]).*abs(mu_pr[2]);

Here’s the tricky part. If I use a scaling of 5 as in this example, I get 100% of the samples hitting the maximum tree depth, and all(?) the model parameters are stuck tightly around their prior.

If, however, I use a scaling of 2, the model converges with little problem, and only 6% of the samples hit the maximum tree depth.

I’m not sure what to make of it. Is the scaling interfering with the MCMC posterior sampling? Is this related to the step size I’m using? Or am I missing something else completely?

Any help is greatly appreciated! (even just some sympathy).

Thanks,

Roey

Given that the constant scaling should be equivalent to scaling mu_pr[2] my guess is that the constant scaling of 5 makes for a more difficult adaptation during warmup. In particular the default initialization in Stan might lead to more extreme behavior with the larger constant scaling which frustrates early exploration enough that the sampler adaptation ends up in a poor state that then compromises performance in the main sampling phase.

You can confirm this by digging into the Hamiltonian Monte Carlo adaptation configuration, in particular the individual step sizes and inverse metric elements, between the two fits.

Thanks for this clear explanation, Michael!

1 Like