A brief (edit: lie) update on this: so I wound up thinking a bit more on sakrejda’s original comment to question the statistical model before trying to speed up the physical model (the physical model being the slow bit that gives me the mean function). One thing I had been ignoring in the model diagnostics, were the estimated fraction of bayesian missing information warnings that this model was throwing about quite frequently. The prior I’m using is actually relatively weak while introducing a healthy bit of between-parameter correlations (I use N-1 gaussian distributions that are centered on the first differences of the N parameters).
I sat down and thought about the model a bit more, and realized that in the literature, it’s known that for this specific problem, assuming infinite, perfect/noise-free data, and without regularizing, there’s actually a closed form solution for the ML parameter values, which consists of a finite number of delta functions, superimposed on a background with a value approaching infinity. This seemed to jive with the shape of the posterior distributions I was getting, since the marginal posteriors I observe (with a “regularizing” prior like the one mentioned above) tend to be very heavily positively skewed, with only a handful of marginal distributions looking “nicely constrained”. This led me to believe that I was seeing something similar to the “delta functions on an infinite background” scenario from the closed form solution (where the well constrained params correspond to the delta functions, the healthy positive tails for the rest of the params are related to the infinite background, and whatever modes do appear in the skewed/heavy tailed distributions are primarily due to the prior). On top of this, I’m actually purposefully “overparameterizing” the model, to where I expect a significant portion of the parameters will be very poorly constrained by the data, which I don’t think the
So, I imposed an “additional” gaussian prior on the parameter values (i.e. N mean zero gaussians for the N parameters). This allowed me to bring the treedepth and the stepsize back to their default values, while still getting the model to estimate 100 parameters relatively quickly. The low EBFMI errors still appear from time to time, but for the most part only seem to consistently appear when I try to fit to “garbage” data (low SNR). Otherwise, they will still appear sporadically when I am fitting to “good” data, but very infrequently (which i’m tempted to call “good enough”).
I don’t know that I’m necessarily super fond of this “two prior” approach, since it feels like this is bordering on choosing a prior for computational convenience, but I’m starting to feel also like the problem is just intrinsically “poorly constrained”. I guess the tl;dr then is that I think I’m still going to take a crack implementing the analytic gradient in C++ (since then I could try just running really long sampling and warmup periods). I’m wondering if the quickest route would be to implement the gradient computation directly in the Stan, then take the C++ generated by Stan for the mean function and its gradient, and then try to combine those using the tut for adding a function and its gradient to Stan.
Sorry, that devolved into more of a thought dump than anything else!