Max_treedepth saturated, but increasing it slows sampling to a standstill

mathesong · November 18, 2020, 10:21am

Hi all,

I’m fitting a big hierarchical nonlinear model, on ~18k points and a bunch of groups: about 12k parameters, and p_loo = 1879. It seems to be doing a great job: my fits look excellent, and the parameters seem reasonable. I’m busy working on simulated data to test it now, but based on the fits and the posterior predictive distribution, I’m not feeling concerned. I typically either get 0 or 1 divergences over 6000 iterations (and adapt_delta=0.90), so that’s not really flagging anything either. Also, I’m only really concerned with my population level fixed effects, so some degree of bias at the level of individual data points, or individual random effects is of little concern for me if it doesn’t greatly affect my fixed effects estimation.

However, I’m saturating my max_treedepth: 5999 of 6000 iterations reach the max_treedepth. It’s also taking about 4 or 5 days to fit. This is do-able, but it’s quite annoying for model-building, realising a week later that I need an extra variable. I tried turning up the max_treedepth to 15, and the sampling time is much, much slower. After 24 hours, I haven’t even reached 100 iterations. I don’t even know how to start looking into what’s happening with the higher treedepth, because it’ll just take so long to sample anything at all.

Preliminary questions:

Is saturating the treedepth something I should really even be concerned about? I understand that it’s about efficiency, but sampling for a month seems like it might not be the most efficient use of the sampling time.
Are there other ways that I should be attacking this problem without increasing the treedepth? Reparameterisation? Tightening priors? Other control parameters?

I read through the discussion here (Setting Max Treedepth in difficult high-dimensional models), which seems relevant. @dlakelan was artificially reducing the max_treedepth, and seeing significant speed-ups. And then they could later implement the full model with a better treedepth, or even deciding on a good compromise to optimise the number of effective samples based on the computing time. I suspect that my model shares some of the same characteristics: several very tightly-identified parameters with small variance, for which leaving the typical set takes us to badlands where probability goes to die. The parameters are also quite highly correlated in some cases, which probably complicates the geometry further. I’m actually running inits=0 to be rather more sure that the sampler starts within the typical set, as I had issues with an earlier (smaller) model, where one of the chains could become completely lost and have 100% divergent transitions out in the wilderness, while the others had none.

The discussion above comes from ~3 years ago, and there was some talk of saving warmup samples for reuse, or a “run for 8 hours” mode. It’s also around the upper limit of my level of understanding of how to think about how to resolve these kinds of issues, so maybe I’m not understanding something there correctly. Does the proposal of messing around with artificially lowering the max_treedepth, or just leaving it where it is, come with any other serious downsides that I should be considering? And, based on the comment before, are my saturated treedepths really so problematic after all if everything else is doing ok?

Thanks so much in advance for any help!

I’m using cmdstanr, with cmdstan v2.24.1 by the way.

spinkney · November 18, 2020, 10:34am

Check out (Another) Slow Hierarchical Model in Stan

andrjohns · November 18, 2020, 12:06pm

I’ve seen this behaviour in models where I’ve forgotten to put priors on some parameters, and the default uniform priors give some tricky posterior geometry that the sampler struggles with.

However this can come from a lot of different areas. If you can post your model code we’ll be able to provide some more concrete answers/help (hopefully!)

mathesong · November 20, 2020, 10:06pm

Sorry for taking so long to get back to you both! Haven’t had a second free the last couple of days. Thank you so much for the suggestions!

@spinkney : Looking into the QR decomposition, it sounds like this could be a really amazing quick fix! I’m using brms actually, and I see that there’s an option to automatically perform the QR decomp, so I’m trying that out now, and I’ll see how it goes. Fingers crossed! I’ll update when it’s done.

@andrjohns : As mentioned, I’m using brms for this model, and most of the default priors are really inappropriate for this model, so I was pretty sure I’d set priors over everything, but it could well be that I’ve missed something somewhere. Thank you so much for the incredibly generous offer to look it over! Below I’ve included the stancode of the model (though I removed the function call at the top), as well as the brms function calls, the prior specification and the output of get_prior().

stancode_slow_treedepth_question.txt (13.4 KB) brmscode_slow_treedepth_question.txt (17.5 KB)

Thank you so much in advance for any suggestions at all! O, and if you notice anything odd, do please feel free to mention it. I’m very much on a learning path here :).

mathesong · November 23, 2020, 9:18am

Update on the QR decomposition: I started it running it a hurry as I didn’t have much time last week. I just used the decomp = "QR" input argument in brms, and set it into motion. Seeing that it was not going any quicker than usual during the weekend, I looked at the STAN code, and I see that it was completely unchanged. I even performed a diff, and it was exactly the same. I looked more closely at the case study you linked, and I notice it says " … it will be applicable regardless of the choice of priors and for any general linear model." I presume that this does not apply to nonlinear models then? In any case, the posterior samples which will be highly correlated are for the parameters of the nonlinear model itself, rather than the linear predictors of the nonlinear parameters (i.e. nonlinear model = function(a,b,c) and linear predictors: a~age + factor1, b~age + sex + factor1 etc. - so in the model code, it’s logK1, logVnd, logBPnd, and logk4 which are highly correlated). So perhaps QR decomposition might not be such a quickfix solution after all.

Though I did learn about evaluating the number of leapfrogs per iteration, which seems to be 1023 for just about all iterations. I suppose this is limited by the treedepth, and would increase with higher values. But it’s becoming increasingly clear I really need to do more homework and take a deeper dive into learning more about the fundamentals of HMC. I’ve been meaning to find the time to really do this. I guess this week will be it! So I’ll update if I have any insights along that path.

Regardless, if anyone has any suggestions, or spots anything apparent in the model code that I should check out or try, I’d be more than happy to dig in and give it a try. Thanks so much in advance!

mathesong · November 27, 2020, 9:00am

I tried a simpler model (too simple for eventual purposes, but hopefully helpful for debugging), and I’m now only saturating the treedepth in 16.5% of the iterations after removing the random effects for (1|WAYID:Region), which is about 3600 (highly-constrained) parameters. So removing this reduces our saturated treedepth issue to a significant extent, but not completely.

To those with more trained eyes: is this evidence that the issue is likely in this last random effect because it resolves most of the treedepth issues, or does the fact that it doesn’t completely remove the treedepth issue suggest that I should be digging deeper into something else?

Some background in case it helps: in this example, we have a pharmacokinetic curve (it’s an analytical solution, so the function evaluation is very quick and likely isn’t the issue here with the long runtimes), and we have a set of curves of about 18 points each: one for each of 9 regions of each of ~100 individuals. So I’ve got several fixed and random effects either across individuals (but applied across all regions), or across regions (but applied across all individuals), and then a final random intercept to make small adjustments to each of the parameters to accommodate individual curves (i.e. each region of each individual --> WAYID:Region). So this last random effect is for 4 nonlinear parameters X 9 Regions X 100 individuals = 3600 parameters - though this is highly constrained (for the log parameters, I have a prior SD of halfnormal(0, 0.01) or halfnormal(0, 0.02)).

Testing for issues in the priors

@andrjohns: regarding your point about having forgotten to set a prior over some or other parameter, I’ve taken a closer look. As far as I can see, there are two candidate parameters that I left the default priors over from brms which could be causing issues, and I’ve been looking into those: the splines and the correlation matrices.

I originally left the LKJ(1) prior over the correlation matrices because we really expect some correlations to be quite strong, and others less strong. I tried fitting the model with LKJ(2) priors over all (3) of the correlation matrices, and I actually ran into convergence issues. Not only that, my treedepth was still completely saturated. So that doesn’t appear to be the issue.

I also didn’t set a prior over the smooth that I use over the sigma over time. I really don’t have a feel for setting priors over splines, so I left this as the default (student_t(3, 0, 2.5)), but the posterior estimates seemed relatively in line with the prior, so I ignored it. However, I notice that in the more complex model, the sigma_s parameter has a large magnitude (= -9) compared with the simpler model (= -3.4 ). However, both of these values seemed sufficiently compatible with the default prior, so I had chalked this up to being irrelevant. But do let me know if you think it is or not.

I also tried another model without the spline, but where I extracted the fitted spline predictions over time and normalised, and fit the model as log(sigma) ~ 0 + Intercept * splinepred . Just in case it was the spline causing the issues. I fit the simpler model, where 16% of iterations reached the max_treedepth with the spline, but with the spline predictions, this reached 24%. So this doesn’t seem to be the cause either…

Lastly, I also checked a model where I’d tightened up all the priors over the intercepts - probably beyond what we could reasonably know a priori, and still 99.9% saturated treedepth.

I’m at a bit of a loss here :. Still holding out some hope that it’s something stupid that I’ve missed…

avehtari · November 29, 2020, 5:08pm

Have you checked Rhats and effective sample sizes for hints along which parameters the sampler has problems moving fast enough (mixing)? It might be helpful to check that also in the unconstrained space.

mathesong · December 1, 2020, 12:26am

Thank you so much @avehtari: this is extremely helpful advice! I hadn’t thought that looking into the Rhats and ESS might also reveal information about which parameters to look into for treedepth issues, but I guess that with not being able to effectively explore the posterior, it’s more likely that the other diagnostics will also flag issues. Not quite sure how to go about checking this “in the unconstrained space” though? Would be very happy to read anything that might cover the topic.

I’ve attached the output of the model (above) below

All my Rhats are 1.00 or 1.01, and all my Bulk ESS are > 350 (for 6000 post-warmup samples). My lowest ESS seem to be mostly among my population effects. I’ll try fitting the model again with slightly tighter priors over some of these and see if it makes a difference. I’ll update with any news!

And do please let me know if your more trained eyes spot anything else that I should be looking into.

Thanks again!

There were 1 divergent transitions after warmup. Increasing adapt_delta above  may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup Family: gaussian 
  Links: mu = identity; sigma = log 
Formula: TAC ~ twotcm_log_stan(logK1, logVnd, logBPnd, logk4, logvB, t_tac, t0, slope, alpha, beta, gamma, A, B, C, peaktime, meanblood) 
         sigma ~ 1 + s(t_tac) + Region + (1 | WAYID)
         logK1 ~ 1 + Region + SEX + Age_dec + (1 | k | WAYID) + (1 | l | WAYID:Region)
         logBPnd ~ 1 + Region + SEX + Age_dec + PMS4y + PMS4y:Region + (1 | k | WAYID) + (1 | l | WAYID:Region)
         logVnd ~ 1 + (1 | m | Region) + (1 | k | WAYID) + (1 | l | WAYID:Region)
         logk4 ~ 1 + (1 | m | Region) + (1 | k | WAYID) + (1 | l | WAYID:Region)
         logvB ~ 1 + SEX + Age_dec + (1 | WAYID) + (1 | m | Region)
   Data: modeldat_long (Number of observations: 17298) 
Samples: 3 chains, each with iter = 3000; warmup = 1000; thin = 1;
         total post-warmup samples = 6000

Smooth Terms: 
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sds(sigma_st_tac_1)     3.28      0.92     2.00     5.59 1.00      910     1641

Group-Level Effects: 
~WAYID (Number of levels: 97) 
                                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(sigma_Intercept)                         0.42      0.03     0.37     0.48 1.00      504     1270
sd(logK1_Intercept)                         0.27      0.02     0.24     0.31 1.00      578     1301
sd(logBPnd_Intercept)                       0.33      0.02     0.28     0.38 1.00      785     1456
sd(logVnd_Intercept)                        0.39      0.03     0.34     0.45 1.00      595     1569
sd(logk4_Intercept)                         0.12      0.01     0.10     0.14 1.00     1094     2157
sd(logvB_Intercept)                         0.67      0.06     0.57     0.80 1.00      593     1088
cor(logK1_Intercept,logBPnd_Intercept)      0.16      0.10    -0.03     0.36 1.00      463      802
cor(logK1_Intercept,logVnd_Intercept)       0.60      0.07     0.46     0.72 1.00      597      974
cor(logBPnd_Intercept,logVnd_Intercept)    -0.56      0.07    -0.68    -0.40 1.00      627     1225
cor(logK1_Intercept,logk4_Intercept)       -0.25      0.10    -0.43    -0.04 1.00      754     1252
cor(logBPnd_Intercept,logk4_Intercept)     -0.32      0.10    -0.50    -0.11 1.00      738      974
cor(logVnd_Intercept,logk4_Intercept)       0.02      0.10    -0.18     0.22 1.00      778     1504

~WAYID:Region (Number of levels: 873) 
                                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(logK1_Intercept)                         0.05      0.00     0.05     0.06 1.00     2411     3543
sd(logBPnd_Intercept)                       0.04      0.01     0.02     0.05 1.00     1046     1968
sd(logVnd_Intercept)                        0.05      0.00     0.04     0.06 1.00      619     1881
sd(logk4_Intercept)                         0.02      0.00     0.01     0.02 1.00     2170     3120
cor(logK1_Intercept,logBPnd_Intercept)     -0.21      0.15    -0.51     0.09 1.00     1661     2739
cor(logK1_Intercept,logVnd_Intercept)       0.66      0.09     0.47     0.82 1.00      672     1337
cor(logBPnd_Intercept,logVnd_Intercept)     0.40      0.18     0.07     0.76 1.01      366     1490
cor(logK1_Intercept,logk4_Intercept)       -0.35      0.19    -0.69     0.04 1.00     5107     3953
cor(logBPnd_Intercept,logk4_Intercept)      0.71      0.17     0.31     0.96 1.00     1233     2534
cor(logVnd_Intercept,logk4_Intercept)       0.28      0.21    -0.15     0.67 1.00     2949     4475

~Region (Number of levels: 9) 
                                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(logVnd_Intercept)                      0.19      0.04     0.13     0.28 1.00     2854     3863
sd(logk4_Intercept)                       0.16      0.03     0.11     0.24 1.00     2515     3526
sd(logvB_Intercept)                       0.16      0.03     0.11     0.24 1.00     2801     3713
cor(logVnd_Intercept,logk4_Intercept)     0.00      0.24    -0.47     0.46 1.00     1992     2916
cor(logVnd_Intercept,logvB_Intercept)     0.22      0.23    -0.25     0.62 1.00     2782     3530
cor(logk4_Intercept,logvB_Intercept)      0.49      0.21     0.01     0.82 1.00     2249     3112

Population-Level Effects: 
                                       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_Intercept                           -5.60      0.05    -5.69    -5.51 1.01      234      461
logK1_Intercept                           -2.33      0.03    -2.39    -2.27 1.01      304      677
logK1_Regionmed                           -0.01      0.01    -0.03     0.01 1.00     1500     2558
logK1_Regionhip                           -0.22      0.01    -0.24    -0.20 1.00     1686     3005
logK1_Regionamy                           -0.33      0.01    -0.35    -0.30 1.00     1836     2999
logK1_Regionpip                           -0.27      0.01    -0.29    -0.25 1.00     1479     2755
logK1_Regionins                           -0.04      0.01    -0.06    -0.01 1.00     1650     2725
logK1_Regionacn                           -0.04      0.01    -0.06    -0.02 1.00     1751     3200
logK1_Regioncin                           -0.02      0.01    -0.05    -0.00 1.00     1698     3041
logK1_RegionWAY_rphelpscs_c               -0.30      0.02    -0.34    -0.27 1.00     2847     3511
logK1_SEXMale                             -0.05      0.03    -0.11     0.01 1.00      463      874
logK1_Age_dec                             -0.03      0.02    -0.06     0.00 1.01      448      945
logBPnd_Intercept                          1.75      0.04     1.67     1.83 1.00      495      917
logBPnd_Regionmed                          0.03      0.02    -0.01     0.07 1.00     2403     3987
logBPnd_Regionhip                          0.18      0.02     0.13     0.22 1.00     3143     4020
logBPnd_Regionamy                          0.10      0.03     0.05     0.16 1.00     3285     3922
logBPnd_Regionpip                          0.00      0.03    -0.06     0.06 1.00     3142     4078
logBPnd_Regionins                          0.36      0.02     0.31     0.40 1.00     2855     3982
logBPnd_Regionacn                          0.15      0.02     0.11     0.19 1.00     2632     4086
logBPnd_Regioncin                         -0.01      0.02    -0.05     0.03 1.00     2683     3551
logBPnd_RegionWAY_rphelpscs_c             -0.03      0.03    -0.08     0.02 1.00     3477     4458
logBPnd_SEXMale                            0.00      0.04    -0.09     0.09 1.01      459      755
logBPnd_Age_dec                           -0.02      0.02    -0.06     0.02 1.00      491     1102
logBPnd_PMS4yAE                           -0.01      0.04    -0.09     0.07 1.00      910     1724
logBPnd_PMS4yNRM                          -0.01      0.04    -0.09     0.07 1.00      772     1403
logBPnd_Regionmed:PMS4yAE                  0.00      0.02    -0.04     0.04 1.00     2109     3570
logBPnd_Regionhip:PMS4yAE                  0.01      0.02    -0.03     0.06 1.00     2468     3898
logBPnd_Regionamy:PMS4yAE                  0.00      0.02    -0.04     0.05 1.00     2438     3660
logBPnd_Regionpip:PMS4yAE                  0.01      0.02    -0.03     0.05 1.00     2588     3374
logBPnd_Regionins:PMS4yAE                 -0.01      0.02    -0.05     0.03 1.00     2119     3443
logBPnd_Regionacn:PMS4yAE                  0.02      0.02    -0.02     0.06 1.00     2548     3446
logBPnd_Regioncin:PMS4yAE                  0.01      0.02    -0.03     0.05 1.00     2564     3285
logBPnd_RegionWAY_rphelpscs_c:PMS4yAE     -0.04      0.03    -0.09     0.02 1.00     3685     3487
logBPnd_Regionmed:PMS4yNRM                 0.00      0.02    -0.04     0.04 1.00     2206     3145
logBPnd_Regionhip:PMS4yNRM                -0.02      0.02    -0.06     0.03 1.00     2239     3798
logBPnd_Regionamy:PMS4yNRM                -0.01      0.02    -0.06     0.03 1.00     2669     3337
logBPnd_Regionpip:PMS4yNRM                 0.00      0.02    -0.04     0.05 1.00     2475     3056
logBPnd_Regionins:PMS4yNRM                -0.00      0.02    -0.04     0.04 1.00     2209     3305
logBPnd_Regionacn:PMS4yNRM                 0.01      0.02    -0.03     0.05 1.00     2345     3564
logBPnd_Regioncin:PMS4yNRM                 0.00      0.02    -0.04     0.04 1.00     2329     3617
logBPnd_RegionWAY_rphelpscs_c:PMS4yNRM     0.05      0.03     0.00     0.10 1.00     3059     3916
logVnd_Intercept                          -0.62      0.08    -0.77    -0.47 1.01      745     1387
logk4_Intercept                           -3.84      0.06    -3.95    -3.73 1.00     1177     2193
logvB_Intercept                           -3.63      0.09    -3.81    -3.46 1.01      312      797
logvB_SEXMale                             -0.01      0.05    -0.11     0.07 1.00     1435     2585
logvB_Age_dec                              0.01      0.04    -0.06     0.08 1.01      467     1020
sigma_Regionmed                            0.07      0.02     0.03     0.12 1.00     2401     3785
sigma_Regionhip                            0.30      0.03     0.25     0.35 1.00     2256     4171
sigma_Regionamy                            0.42      0.02     0.37     0.47 1.00     2259     3823
sigma_Regionpip                            0.20      0.02     0.15     0.25 1.00     2087     4022
sigma_Regionins                            0.24      0.02     0.20     0.29 1.00     2344     3505
sigma_Regionacn                            0.19      0.02     0.14     0.24 1.00     2307     3329
sigma_Regioncin                            0.29      0.02     0.24     0.33 1.00     2186     3851
sigma_RegionWAY_rphelpscs_c                0.99      0.02     0.94     1.03 1.00     2226     3977
sigma_st_tac_1                            -9.06      0.63   -10.29    -7.83 1.00     5028     4600

Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

avehtari · December 3, 2020, 5:18pm

Exceeding treedepth doesn’t make the sample invalid, but indicates inefficiency and often also problems in the model.

Your model is complex and certainly has difficult posterior and also looking at the posterior intervals some parameters are not well informed by data.

Sometimes ESS estimates are overoptimistic, but if ESS doubles when you double the number of iterations then it’s more likely that mixing is good enough for central limit theorem to kick in.

mathesong · December 7, 2020, 11:42am

Thanks so much for the suggestions, and the followup! I can now update (I accidentally ran the first model with all the iterations again, and noticed too late to stop it).

Tightened up priors over intercept terms

Firstly, I tried tightening up my priors over the intercept terms. This didn’t seem to make any difference. I went from 5999 saturated iterations to 5998 out of 6000. I don’t think I could really justify making them any tighter for this or future applications of the model, as they start needing to be informed by the dataset at hand.

Removal of covariates

I removed all the fixed effects which add little to the model. This did help somewhat: I’m now down to 70% saturated iterations. But there’s still quite some room for improvement.

In this simpler model, I see that my intercepts over some of the parameters which should be fitted most easily are struggling. These are also the same parameters that I used fixed effects between regions for instead of random effects, since we expect the differences to be more severe. But this was also hard for prior definition, since this model will be applied in new situations where we don’t have quite so much prior knowledge. I’ll try defining these using random effects too and see whether that doesn’t simplify things a little.

Another thought is whether some of my random effect priors are too restrictive. I set pretty restrictive zero-centred priors over the SD of several of the random effects, and the posterior settled on a mean SD sometimes 2 or even 3 times larger than the prior SD from zero (i.e. sd_prior = normal(0, 0.1) --> sd_posterior = mean 0.3). These are mostly parameters that my field has traditionally considered to be fairly constant, or even assumed to have no variation, which we’re now really able to dig into with a hierarchical framework. But these really should be quite highly constrained since we don’t believe in big differences between individuals.

–> Could over-regularised random effects present an issue for treedepth? I’ll try this empirically too, but I must admit that I’m rooting for its failure as I want these parameters to remain highly constrained.

I’ll update when I get more results.

Thanks again for the help!

avehtari · December 8, 2020, 3:33pm

Maybe @paul.buerkner could comment whether he knows that this brms would be prone to problems?

Family: gaussian 
  Links: mu = identity; sigma = log 
Formula: TAC ~ twotcm_log_stan(logK1, logVnd, logBPnd, logk4, logvB, t_tac, t0, slope, alpha, beta, gamma, A, B, C, peaktime, meanblood) 
         sigma ~ 1 + s(t_tac) + Region + (1 | WAYID)
         logK1 ~ 1 + Region + SEX + Age_dec + (1 | k | WAYID) + (1 | l | WAYID:Region)
         logBPnd ~ 1 + Region + SEX + Age_dec + PMS4y + PMS4y:Region + (1 | k | WAYID) + (1 | l | WAYID:Region)
         logVnd ~ 1 + (1 | m | Region) + (1 | k | WAYID) + (1 | l | WAYID:Region)
         logk4 ~ 1 + (1 | m | Region) + (1 | k | WAYID) + (1 | l | WAYID:Region)
         logvB ~ 1 + SEX + Age_dec + (1 | WAYID) + (1 | m | Region)
   Data: modeldat_long (Number of observations: 17298)

paul.buerkner · December 8, 2020, 3:41pm

I didn’t check the details of the model but there is a lot of varying coefficients being modeled a lot of which are also correlated all of which in a complex non-linear model. I would actually be quite surprised if the model didn’t have problems. I assume you already rescaled “age” so that it is not in years? This kind of big variable scale can often cause sime problems. I am sorry that I cannot be of much help right now beyond what people have already suggested.

mathesong · December 10, 2020, 11:22am

Thanks so much for the suggestions! I actually realised when working with a simpler variant of the model that Age caused issues, so it’s already rescaled to decades. So that’s a spot-on recommendation regardless!

Regarding the model being expected to have problems: the issue appears to primarily be one of efficiency. The fits themselves look perfect, and the smooth over sigma results in absolutely stellar prediction intervals! This is a model which is typically fit to each curve individually, and we’re trying to apply it to many curves at once to take advantage of shrinkage. The 5 parameters per curve in the standard nonlinear least squares approach is reduced to a p_eff per curve (i.e. WAYID:Region) of about 2 after shrinkage.

So it’s really doing all the right things, and performing exactly the function we’d hoped it would. It’s only the treedepth warning and the fact it takes a few days to fit which is a bit of a bother. But maybe I should just accept that it’s going to be a bit of a beast to fit…

Following @avehtari’s suggestions though, I’d not thought about the fact that all my fixed effects over the factor predictors might be reducing our efficiency, so I’ve been experimenting with using random effects instead. But I’ve had limited success thus far. But I’ve got a feeling that this might be a promising avenue to keep on plugging at. But I will definitely update if I learn about anything that resolves it, and what could be a good approach to reduce treedepth. Already the suggestion to look into Rhat and ESS has been of substantial help!

Thank you all again for all the help!

Topic		Replies	Views
Setting Max Treedepth in difficult high-dimensional models Modeling	13	4252	August 6, 2017
Max treedepth warnings in hierarchical multivariate model Modeling	10	1958	November 17, 2018
How to speed up sampling in rstan? Modeling rstan , performance , hierarchical-model	6	6151	July 24, 2020
100% exceeding max treedepth Modeling fitting-issues , specification	6	542	August 22, 2022
Suggestions for improving model of bird monitoring data Modeling performance	11	1021	November 17, 2020

Max_treedepth saturated, but increasing it slows sampling to a standstill

Testing for issues in the priors

Tightened up priors over intercept terms

Removal of covariates

Next

Related topics