Problems converging using custom gamma2 distribution

Hi! I’ve spent over a year on this problem and need help!

I am trying to model RMSE values produced by external algorithms.
I have 4 factors (for simplicity, being modeled as unordered):
size (3 levels)
noise (3 levels)
shape (2 levels)
interps (9 levels; the repeated measure)

Also, I have:
g_subj (unique id)
g_rep (20 levels; unique id; subjects are nested within replicates; each rep is a replication of the entire experiment)

see Data:
_RMSE_data_unord.csv (379.2 KB)

The RMSE values are y and I am using the custom gamma2() distribution from @paul.buerkner notebook to model the heteroscedasticity.

  1. in fitting the variance for the Gamma, are we estimating different shape parameters (essentially, different gamma distributions)? Does this completely eliminate the variance-mean relation?

The model I would like is:

y ~ g_size * g_noise * g_shape * g_interps + (g_size * g_noise * g_shape * g_interps | g_rep) + (g_interps | g_subj)
v ~ g_size * g_noise * g_interps

I am starting simpler with:

y ~ g_size * g_noise * g_shape * g_interps + (1| g_rep) + (1| g_subj)
v ~ g_size * g_noise * g_interps

but keep getting different combinations of max tree depth and divergents for every iter after warmup.

I was able to successfully implement:

y ~ g_size * g_noise * g_shape * g_interps
v ~ g_size * g_noise * g_interps

I originally started with nlme but it had trouble modeling 0 variances and 0 random effects. I then moved to lme4 but it could not model the heteroscedasticity, so now I’m using brms.
Run time for the models range from 4 hours to 3 days depending on the complexity.

This is the current model that works:

fit0g16_4a_wPriors20sub3_95a <- brm(bf(y ~ g_size * g_noise * g_shape * g_interps,
									  v ~ g_size * g_noise * g_interps), 
									prior =c(
									# y ~ Intercept + b
									prior(normal(4, 3),class="Intercept"),
									prior(normal(2, 3),class='b'),
									prior(normal(0, 3),class="Intercept", dpar = 'v'),
									prior(normal(0, 5),class='b', dpar = 'v')),
									family = gamma2, stanvars = stanvars_gamma2,
									sample_prior = "yes",
									seed = 2011631714,
									data = t_long_CNSubset20red1_sub3_unord, warmup = 1000, iter = 2000, chains = 4, core = 4,
									control = list(adapt_delta = 0.90, max_treedepth = 10))

I’ve tried different priors for SD but get different combinations of max tree depth and divergents after warmup for all iters.

This model does not converge and has 4000 max tree depths:

fit0g16_4a_wPriors20sub3_27x10b <- brm(bf(y ~ g_size*g_noise*g_shape*g_interps + (1 | g_rep) + (1 | g_subj),
                                          v ~ g_size*g_noise*g_interps),
                                	prior =c(
                                     # y ~ Intercept + b
                                     prior(normal(4, 3),class="Intercept"),
                                     prior(normal(2, 3),class="b"),
                                     # v ~ Intercept + b
                                     prior(normal(0, 4),class="Intercept",dpar="v"),
                                     prior(normal(0, 5),class="b",dpar="v"),
                                     # y ~ sd
                                     prior(normal(0, 0.2),class='sd')),
                                  family = gamma2, stanvars = stanvars_gamma2, 
									sample_prior = "yes",
                                    seed = 2011631714,
                                    data = t_long_CNSubset20red1_sub3_unord, warmup = 1000, iter = 2000, chains = 4, core = 4,
                                    control = list(adapt_delta = 0.90, max_treedepth = 10))
  1. Would modeling correlations improve convergence?
  2. Help!

As a first step, I recommend you start with some simple models to better understand where the convergence issues come from.

Thank you so much @paul.buerkner for responding. I’ve poured over so many of yours and @martinmodrak docs and blogs! They are by far the most helpful things on Bayesian modeling.

I have tried simpler models. From my lmm books, the modeling approach I am taking is the top-down strategy by first fitting the loaded means model with all the fixed-effects and then add the varying group-specific effects.

I can successfully model without any group effects:

y ~ g_size * g_noise * g_shape *g_interps
v ~ g_size * g_noise * g_interps

It is when I add the varying intercepts for g_rep and g_subj on y, that I have convergence issues.

y ~ g_size * g_noise * g_shape * g_interps + (1 | g_rep) + (1 | g_subj)
v ~ g_size * g_noise * g_interps

From the glmer and lme models I ran, I know I need to model at least + (g_noise * g_interps | g_rep) and if possible + (interps | g_subj), as both have certain levels that have a significant effect.

I think the problem stems from some levels of the factors having rather large varying effects SD while several others are zero and several others still are extremely small say 0.00025. I am having trouble specifying the priors on these different ranges. Since all my predictors are factors, there is no standardizing to do. Also, I’ve read on this forum that setting priors on SD smaller than 0.02 should never be done.
Additionally, I don’t know that I have the prior knowledge to make such a strong prior.

My first concern would be if you have enough data to inform all the parameters - it seems you have well over 300 fixed effects + all the varying intercepts/effects. Data not being informative about all parameters can result in all sorts of pathologies as discussed in https://betanalpha.github.io/assets/case_studies/underdetermined_linear_regression.html

A good way to see if you can add additional parameters to your model and still learn something is via posterior predictive checks (see docs for pp_check, other posts here or https://arxiv.org/abs/1709.01449). If you can devise a posterior predictive check that shows a systematic inability of the model to capture some aspect of the data - e.g. if you do grouped PP check for your replicates and some replicates are “above” what your model expects and some are “below” you might be able to add replicates as a predictor. If on the other hand the posterior variability encompasses the between-replicate variability quite well, it is much less likely that your model is able to inform the per-replicate varying intercept.

I’ve done a model selection process a bit like this in here: https://lab126.mbu.cas.cz/images/papers/Niederlova_2019_Supplemental_Statistical_Analysis.pdf (starting around page 22), but it is for a bernoulli response which is a bit easier to do PP checks for.

Also note that you can use triple backticks (```) to mark code in your post (I’ve edited the post to show that).

Such good information @martinmodrak. Thanks for pointing me in a direction.

Several points of confusion:

  1. I thought this was a perk about Bayesian models. That they allow for modeling when you don’t have enough data.

  2. Number of obs: 3240, groups: g_subj, 360; g_rep, 20; I can scale up to include more reps but this costs some time. Each new rep adds 18 subjects with 9 repeated measures. I limited it to 20 reps because it takes a long time to run. Should I add more reps and see what happens?

  3. I’ve read the underdetermined blog and while I understand it on the surface, I don’t think I fully grasp what it is saying; I am having trouble connecting the dots with what I know and cementing it in my head…is there an additional reference? maybe that clarifies the hyperplane of degenerate model configurations? Is this not the same as nonidentifiability and multimodel issues?

  4. I have done pp_checks on models with and without group effects and for both it fails to capture many regions. Are pp_checks valid when the model doesn’t converge?

  5. Here is the pp_check grouped for the nonconverging model with only g_rep vaying intercept group effect.
    pp_check(model, type = 'stat_grouped', stat = "mean", group = "g_rep")

pp_check_nogroups.pdf (10.3 KB)

  1. However, if I were to model with gamma instead of gamma2 (only estimating a single shape parameter for gamma instead of multiple v parameters in gamma2), the pp_check of a nonconverging model with all g_rep group effects and the pp_check of a nonconverging model with varying interps and intercept g_subj group effects, both have seemingly perfect fits with outrageously large estimates for the gamma shape parameter and the Dharma plots exemplify the heteroscedasticity I am expecting. This perfect fit behavior I would assume is from not having enough data. But again, I thought this was the whole point of Bayesian, that with priors you can overcome this lack of data. Perhaps this is bayesians way of overfitting… in that you have to make ridiculously strong priors?

  2. I thought when modeling one is to always include the predictors that convey the structure in the data, regardless if they are significant. How can adding the varying intercepts cause there not to be enough data? I’ve read in posts on here that the amount of data needed for additional group-effect estimates is irrelevant for Bayesian.

  3. When you say “Data not being informative about all parameters” what exactly do you mean? Are you saying that there is not enough information in the data to estimate a particular effect and more data would NOT be helpful unless it conveys new information? or just that not enough data was collected to discern every effect from each other?

  4. I tried running a model with g_size and g_noise as numerical instead of factors and I get “Rejecting initial values… Shape parameter is 0, but must be > 0! … Shape parameter is inf, but must be finite!”
    which is from 0/# and #/0 operations in the gamma2 reparameterization definition for the shape parameter. Why was this not a problem when using factors? and how would I fix this? @paul.buerkner

Looking at this a bit more, I think model mismatch is a slightly more likely cause of your problem than underdetermined regression. Details below.

Well the Bayes approach helps when there is little data, but there are some limits for computing the posteriors. A simple example is fitting a normal distribution with unknown mean and unknown variance. When we have at least 2 observations, both mean and variance are constrained mostly independently, making the posterior more or less a “blob” in 2D space. But when we only have 1 observation the situation changes qualitatively - either we can have small variance and the mean is constrained close to the single observation or large variance and then the mean is allowed a wide range of values. This dependency structure results in a funnel-like posterior (the extent of the funnel is constrained only by the prior). This is not a problem in theory, but in practice such dependencies and structures make it hard to explore the posterior with an algorithm (including Stan). The problem is similar to the “Neal’s funnel”, but cannot be easily resolved by reparametrization.

Similarly, if you try to fit a linear regression (even with known variance) to a single point, the coefficients become unconstrained individually - any value of intercept is consistent with the data if you choose an appropriate slope. The data only carve out a subspace of possible pairs.

In both cases the problem is related to non-identifiability - you end up with many very different model configurations consistent with the data.

While you probably have quite a lot of data to fit the mean, I don’t have intuition how this might interact with you also having a predictor for the variance of the gamma as variances require more data to be learned than means. I would guess it is more likely not a problem.

I would also note that if you are constrained by runtime, you might want to try INLA - it is a bit more cumbersome to use, but it scales well beyond what you can do with brms (but it doesn’t let you put predictors on variance)

Interpreting pp_checks for non converging models is sometimes still helpful, but you cannot rely on it too much.

This might be an important clue. Just to be clear using the gamma family with fixed shape still doesn’t converge? Could you share the pp_checks? I am unfortunately not familiar with Dharma plots, so can’t comment on that. Also what do you mean by “outrageously large” estimates? Is the posterior narrow around a large estimate or is it very wide including both small and large shapes?

Once again the amount of data is irrelevant for Bayesian theory, but might be a problem for Bayesian computation. I meant “not informative” in the sense I mentioned above, that data only limit combinations of parameters but not individual parameter values.

Anyway, I think you are most likely struggling with your model not being a good match for the data. Your current model is too large to debug anything, so I would suggest you start with modelling only a small subset - say a single size, single noise, single shape and single interp (i.e. eliminating all the fixed effects) and only a few subjects and replicates. Does the model converge? Do the pp_checks look healthy? I would guess that maybe the problem is that the gamma2 response is not a good model and you need a more heavy-tailed distribution (say log-normal). pp_checks should be able to show if this is the case. I would definitely look at the dens_overlay, and stat checks (for sd and min / max stats).

Best of luck with your model!

1 Like

OOOOh…it took me reading this a few time to understand. With 2 points, we can average to estimate the mean and average to estimate the variance; the mean and variance are independent in the posterior.
But if we have 1 point, we can no longer average and may either have an exact fit through the data point which will give a variance of 0 or a smoother fit with a larger variance. This is the dependency between the mean and variance that arises when there is not enough data. I believe I read this in one of my books I’ll have to go back and find these sections.

Please correct me: We can see the same phenomenon (this qualitative change) in the underdetermined blog, the hyperplane of degenerate configurations has this same dependency of the measurement variance either decreasing to 0 for an exact fit or increasing for a smoother fit. This change in measurement variance and how it affects the fit can be seen as the concentration of the likelihood fx (& the posterior density) (aka the variance) varying around this hyperplane (aka the exact fit).
As the variance decreases to 0, the concentration of the likelihood moves closer about the hyperplane and the posterior density becomes more focused/stronger about the hyperplane. As the variance increases, the concentration of the likelihood moves further about the hyperplane and the posterior density becomes weaker about the hyperplane. The hyperplane of degenerate configurations are potential solutions, when the variance is 0, there is an exact fit through one of the configurations, but as the variance increases, we allow for more potential configurations to be possible solutions. It is this fluctuation of the likelihood function and posterior density that make sampling difficult; it creates the funnel whereby a large step size is needed when the variance is large making the posterior density more spread out (too small a step size may lead to max tree depths), and a small step size is needed when the variance is near 0 (too large a step size and we may have divergences).

So now I wonder, How does this relate to multiple measurement variances in heterogeneous variance situations or non-normal variance situations?

Yes, more coefficients to estimate than data, underdetermined, underconstrained system with many solutions defined by the hyperplane of degenerate configurations.

I am still confused as to how priors fit into all of this and why adding stronger priors would not alleviate the problem. Wouldn’t the priors serve to additionally constrain the problem for a solution such that it wouldn’t be considered underdetermined anymore?

Why would you guess it is not a problem? is there no way to tell if you have enough data or whether additional stronger priors will help? I thought heterogeneous variances were a common problem encountered but finding materials explaining this easily in a bayesian sense is sparse. From examples, I believed I had enough data.

I believe modeling the variances is crucial, otherwise nlme with log(y) or glmer would have sufficed :(. But sadly, there are obviously visible heterogeneous variances.

The plots for this severly non-converging model in brms look great if I can just get it to converge!:

                                          v ~ g_size*g_noise*g_interps),
                                	prior =c(
                                     # y ~ Intercept + b
                                     prior(normal(4, 3),class="Intercept"),
                                     prior(normal(2, 3),class="b"),
                                     # v ~ Intercept + b
                                     prior(normal(0, 4),class="Intercept",dpar="v"),
                                     prior(normal(0, 5),class="b",dpar="v"),
                                     # y ~ sd
                                     prior(normal(0, 0.2),class='sd')),
                                  family = gamma2, stanvars = stanvars_gamma2, 
									sample_prior = "yes",
                                    seed = seed3,
                                    data = t_long_CNSubset20red1_sub3_unord, warmup = 1000, iter = 2000, chains = 4, core = 4,
                                    control = list(adapt_delta = 0.90, max_treedepth = 10))

has the following
results: results_10b.txt (25.7 KB)
pp_check: pp_check_10b.pdf (50.3 KB)
and Dharma: dharma_10b.pdf (274.1 KB)

Here’s the converging model using gamma instead of gamma2 (modeling shape instead of v):

									family = brmsfamily("Gamma",link = "log",link_shape = "log"),
									sample_prior = "yes",
									seed = seed3,# inits = inits_list,
									data = t_long_CNSubset20red1_sub3_unord, warmup = 1000, iter = 2000, chains = 4,core=4,
									control = list(adapt_delta = 0.90,max_treedepth=10))

results: results_100a3.txt (17.5 KB)
pp_check: pp_check_100a3.pdf (50.3 KB)
DHarma: dharma_100a3.pdf (282.3 KB)

Much more complex non-converging model with gamma instead of gamma2:

fit0g16_4a_wPriors20sub3_101 <- brm(bf(y ~ g_size * g_noise * g_shape * g_interps + (1 + g_size*g_noise*g_shape*g_interps || g_rep) + (1|g_subj) ), #silent = FALSE,
									family = brmsfamily("Gamma",link = "log",link_shape = "log"),
									sample_prior = "yes",
									seed = seed3,
									data = t_long_CNSubset20red1_sub3_unord, warmup = 1000, iter = 2000, chains = 4,core=4,
									control = list(adapt_delta = 0.90,max_treedepth=10))

results: results_101.txt (34.6 KB)
pp_check: pp_check_101.pdf (45.9 KB)
DHarma: dharma_101.pdf (278.0 KB)

Another non-converging model with gamma instead of gamma2:

								family = brmsfamily("Gamma",link = "log",link_shape = "log"),
								sample_prior = "yes",
								seed = seed3,
								data = t_long_CNSubset20red1_sub3_unord, warmup = 1000, iter = 2000, chains = 4, core=4,
								control = list(adapt_delta = 0.90, max_treedepth = 10))

results: results_103.txt (18.3 KB)
pp_check: pp_check_103.pdf (46.5 KB)
DHarma: dharma_103.pdf (266.1 KB)

In both of these more complex models, the shape parameter is estimated in the tens of thousands.

Not a problem for bayesian theory because it works in the limit of inf, while for computations we are finite and must conted with pathological geometries?

But aren’t these the same thing? Doesn’t limiting data combinations limit parameter values; As you limit enough data combinations you’ll eventually arrive at a unique solution of parameter values? oh but wait, I’m not switching between frequentist to bayesian. In bayesian, data defines the likelihood, once you have enough data to properly represent the likelihood, more data is irrelevant, thus data defines the likelihood which defines the range of values over the parameters?

How can the pp_checks look healthy when the model is so blatantly wrong?

How can gamma2 be wrong when the data is gamma? RMSE is a chi distribution. I tried log-normal and it did not go well but I will take your advise and try again. How would the pp_check be able to show if a more heavy-tailed distribution is appropriate vs poor priors of which I’ve seen in the brms workflow example.

This looks correct to me!

Sometimes they do. But sometimes you would need unreasonably strong priors. And I think (or more precisely: I guess) that in high dimensional settings stuff becomes way more difficult as there are way more many options for the geometry to get problematic. And the higher the dimension, the stronger priors you would likely need. But I am just guessing. As I said it is quite possible, this is not at all the issue with your model.

Yes in the sense that you could get the posterior right, if you had infinite computational resources. (the limit with respect to infinite data is not interesting for this case). With finite resources you might be unable to explore the posterior fully.

I am sorry if I am not very clear - the point I was trying to make is similar to the case of fitting a line to a single point.

Note: I am not suggesting you fit your whole dataset with a simple model, but to filter your data to only contain values that were generated with the same single size/noise/shape/interp. In this setting, the larger model is correct only if this small model is correct for each individual subset.

The PPChecks you sent look a bit like you might be overfitting, I would expect there to be a bit more variability in the densities, but it is far from clear. The main problem I see with your model is that it is very big and complex and thus hard to debug.

If you absolutely know it is gamma than stay with gamma. But I don’t see why RMSE would be chi distributed unless you have some guarantees on the process that generated those RMSE values. I am really just guessing, the model is very big and the data complex.

One way to show this is with using the stat check using maximum or a high quantile as the statistic. If the maximum in data is systematically below maxima from your model, you might need a heavier tail. But this is not a hard rule. Failing PP check just tells you something is wrong. It is then often a bit of detective work to figure out what is wrong.

OMG! I can’t believe I didn’t think to do this!! frump!

That’s what I thought, but this is for the complex models with many group effects. However, when working with model 10b (varying intercepts for g_rep and g_subj with gamma2), the results are more appropriate to what I would expect (reasonable captures the data without overfitting; good looking residuals) except it doesn’t converge.

I goofed in posting the results for model 10b. You should be able to see them now.

The data is a synthetic data set. I have latitude, longitude, depth measurements for a 51x51 grid. Depth are computed from 2 different shape functions (g_shape). These are my truth data sets. From these truth data set I create test cases. I apply 3 levels of down-sampling to make the data set sparser (g_size), and a random sample of noise is added to the depths of which 3 levels of noise magnification is then applied (g_noise). This constitutes 18 g_subj for 1 g_rep. This is replicated by drawing a new sample of noise; the only thing that differs between g_reps is the sampled noise and the only thing that differs between the 18 g_subj for the same g_rep is the experiment combinations applied. Essentially, each g_rep is a remeasuring of depths at specific/precise locations. Thus, g_rep and g_subj are blocking factors. Upon each test case, software is ran to reconstruct the 51x51 grid via various interpolators. From the results, I compute the RMSE between the interpolated surface and truth. From the definition of RMSE, I believe it to be a chi distribution. According to https://en.wikipedia.org/wiki/Chi_distribution:
It [chi distribution] is the distribution of the positive square root of the sum of squares of a set of independent random variables each following a standard normal distribution
Interestingly, I could not find any example where a model was fit to RMSE values.

I expect the RMSE to be small for low noise and dense test cases using the standard interpolator and to increase as sparsity and noise increase depending on the interpolator being used.

Ok… so I followed your advice @martinmodrak and reduce the data set to include only the g_subj for the intercept (g_size = 1, g_noise = 1, g_shape = 0, g_interps = EM031). This gives 20 g_subj which are all the g_reps for an experimental factor combination. And I ran into a slew of problems.

With nlme:

model.1.fit_6000b1 <- lme(log(y) ~ 1, random = list(g_rep = pdDiag(form = ~ 1),g_subj = pdDiag(form = ~ 1)), data = t_long_CNSubset20red1_sub3_unord_test1, method = "REML", control=ctrl)

lme_6000b1.pdf (9.1 KB)
lme_6000b1.txt (487 Bytes)

The model runs sucessfully and from the plot we see that the residuals are heteroscedastic. The residuals increase with the fitted values.

nlme with varPower(fitted):

model.1.fit_6000b1_HRV1 <- lme(log(y) ~ 1, random = list(g_rep = pdDiag(form = ~ 1),g_subj = pdDiag(form = ~ 1)), weight = varPower(form = ~fitted(.)),data = t_long_CNSubset20red1_sub3_unord_test1, method = "REML", control=ctrl)

lme_6000b1_HRV1.pdf (8.6 KB)
lme_6000b1_HRV1.txt (1010 Bytes)

In the plot, the residuals are better, but we see the problem with the estimates as nlme fails to handle 0 residuals.

I tried to replicate these results in brms but ran into convergence even for the most simple model. I remembered you saying a mis-specified model such as the wrong distribution would cause these issues so I tried the gamma distribution and did have somewhat better luck in converging.

This brings me back to the confusion regarding gamma2 in @paul.buerkner notebook .

  1. is the variance of gamma2 still a function of the mean even though it is estimated independently of the mean?
  2. what is the difference between gamma and shifted_gamma? is this a scenario for shifted_gamma?
  3. how would I model each factor combination as having a different variance-mean function; the variance increases with the mean but at a different rate depending on the factor combination. For example, larger fitted values might vary less than smaller fitted values. or for another example, two estimated mean values that are equal may have different variances.
  4. I believe I should be able to get brms to match nlme HRV1. How do I implement varPower? I’ve tried different variations of
bf(y ~ yhat, nl = TRUE) +
   lf(yhat ~ 1 + (1 |g_rep) + (1 | g_subj)) +
   nlf(v ~ v1  * sqrt(yhat)) +
   lf(v1 ~ 1  )

and while I don’t get convergence, the plots are not encouraging that I’ve modeled it correctly.

Originally, I tried modeling gamma with a separate shape parameter for each factor combination. But this did not work well.

Here it is the brms model to gamma for the small subset:

fit0g16_4a_wPriors20sub3_2000b1 <- brm(bf(y ~ 1 + (1 | g_rep) + (1 | g_subj)),
                                  family = brmsfamily("Gamma", link = "log", link_shape = "log"),
									sample_prior = "yes",
                                    seed = seed3,
                                    data = t_long_CNSubset20red1_sub3_unord_test1, warmup = 1000, iter = 2000, chains = 4, core = 4,
                                    control = list(adapt_delta = 0.80, max_treedepth = 10))

while this converges fine, I get a large shape parameter estimate:

shape   454.64    166.20   189.24   842.70 1.00     5466     2458

I would expect this to capture the variance-mean structure since it is one factor combination but it does not. I don’t know where to go from here… :’(

When I try to model shape ~ 1 there are converges issues. Do you know why explicitly modeling shape in the formula would cause problems in converging? Aren’t they the same thing?

bf(y ~ 1 + (1 | g_rep) + (1 | g_subj),
   shape ~ 1),
  1. is |subset() in the brmsformula the same as saving the subsetted data as a separate dataframe? I tried both and got different convergence messages.

I don’t really understand nlme, but to me, the residual plot doesn’t necessarily suggest heteroskedasticity - you overestimate small values and underestimate big values, so that simply might be that you need a more heavy-tailed distribution (like the log-normal). Still the regularity of the residuals is puzzling and might suggest other issues.

Good. I would then suggest you repeat the process (e.g. by focusing only on one/few subjects/replicates) to find the smallest model/dataset that reliably has convergence problems and the biggest model/dataset that reliably fits. You can then use a combination posterior predictive checks on the biggest converging model to suggest changes to response and use visualisations (as in Visual MCMC diagnostics using the bayesplot package • bayesplot) to pinpoint problems in the diverging models (which is much easier to do with fewer parameters).

I am not sure a large fitted shape necessarily means that you failed to capture the variance-mean structure. Importantly, the varying intercepts for subject and replicate can effectively act somewhat similarly to increasing variance in the response distribution.

That should IMHO really be the same… Weird. But note that convergence issues are often a bit stochastic, do you get the same results when you repeat the fits? Also you can use make_stancode to show the generated stan code for each model and inspect where they differ.

I think they should be roughly the same, but as I noted above, convergence (and especially the specific way non-convergence shows) can be a bit stochastic, so I would rerun the fits and only worry if the results are always different.

I think that you are quite unlikely to have the RMSE be derived from “independent random variables each following a standard normal distribution” (unless this is guaranteed by your interpolator, but I don’t see how you could do this - in particular, RMSE between nearby points on the grid are unlikely to be independent), so this is a useful starting point, but not a strong guarantee.

That’s right…I’m so deep in this problem nothing makes sense anymore…Thank you for the correction.

Ok I’m so confused about whether Bayesian can handle 0 variance group-effects, perfect correlation, and 0 residual variance. Could you clarify this for me? What if it’s truly 0 or perfect? Does a truly 0 variance in the model create the same rank deficiency problems? Does a converging Bayesian model with 0 variance and perfect correlations mean the model is overparameterized and untrustworthy?

No, brms generally expects everything to be non-zero. I’d be however hugely skeptical of 0 variance or perfect correlation in data for anything except the most well done physics experiments (and even there, I would pause and recheck).

Hallelujah! It worked! I’ve got a model!!!
I finally got a working model without error or warnings, and the pp_check and DHARMa plots are beautiful!
All thanks to the parallelization merged to the master on github 2 days ago! It appears I am too impatient and assumed the long times and failures to converge meant there had to be something wrong with the model when it seems it just needed a longer tree depth of about 17. So yesterday, I was able to get the parallelization up and running and here’s what I got for the first model I ran. It completed in roughly 13 hours! Let me know if you see something I’ve overlooked, any problems with the model or results, I’d appreciate any feedback.

brms_20_lognormal_mdl4_2 <- brm(bf(y ~ g_size*g_noise*g_shape*g_interps + (1 | g_rep),
                               sigma ~ g_size*g_noise*g_interps),
                               family = lognormal(), #lognormal(link = "identity", link_sigma = "log")
                               prior =c(
                                  # y ~ Intercept + b
                                  prior(normal(0, 5),class="Intercept"),
                                  prior(normal(0, 5),class="b"),
                                  # v ~ Intercept + b
                                  prior(normal(0, 5),class="Intercept",dpar="sigma"),
                                  prior(normal(0, 5),class="b",dpar="sigma"),
                                  # y ~ sd
                                  prior(student_t(3, 0, 0.25),class='sd')),
                               sample_prior = "yes", 
                               seed = seed3,
                               data = t_long_CNSubset20red1_sub3_unord, warmup = 1000, iter = 2000, chains = 4, 
                               threads = threading(8, grainsize = 100),#8
                               backend = "cmdstanr", core = 4,
                               control = list(adapt_delta = 0.80, max_treedepth = 20))
 Family: lognormal 
  Links: mu = identity; sigma = log 
Formula: y ~ g_size * g_noise * g_shape * g_interps + (1 | g_rep) 
         sigma ~ g_size * g_noise * g_interps
   Data: t_long_CNSubset20red1_sub3_unord (Number of observations: 3240) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Group-Level Effects: 
~g_rep (Number of levels: 20) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.04      0.01     0.03     0.06 1.00     1301     2250

Population-Level Effects: 
                                          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                                     5.30      0.01     5.27     5.32 1.01      920     1824
sigma_Intercept                              -4.19      0.11    -4.40    -3.95 1.01      897     1389
g_size4                                       2.63      0.00     2.63     2.64 1.00     1469     1988
g_size8                                       3.49      0.00     3.48     3.50 1.00     1508     2099
g_noise19                                    -0.02      0.03    -0.09     0.05 1.00     1681     2485
g_noise38                                     0.62      0.07     0.49     0.76 1.00     1250     2110
g_shape1                                     -0.01      0.00    -0.01     0.00 1.00     1216     1679
g_interpsEM001                               -0.34      0.00    -0.35    -0.33 1.00     1609     2183
g_interpsEM003                               -0.70      0.01    -0.72    -0.69 1.00     2855     2751
g_interpsEM006                               -0.27      0.00    -0.28    -0.26 1.00     1678     2579
g_interpsEM008                               -0.35      0.00    -0.36    -0.34 1.00     1693     2132
g_interpsEM016                               -0.27      0.00    -0.28    -0.26 1.00     1661     2154
g_interpsEM018                               -0.35      0.00    -0.36    -0.34 1.00     1714     2310
g_interpsEM026                               -0.35      0.00    -0.36    -0.34 1.00     1379     1773
g_interpsEM028                               -0.72      0.01    -0.74    -0.71 1.00     2582     3249
g_size4:g_noise19                            -0.12      0.05    -0.23    -0.01 1.00     1909     2647
g_size8:g_noise19                            -0.13      0.05    -0.23    -0.03 1.00     1789     2666
g_size4:g_noise38                            -0.63      0.12    -0.86    -0.40 1.00     1434     2453
g_size8:g_noise38                            -0.67      0.13    -0.91    -0.43 1.00     1484     2220
g_size4:g_shape1                              0.01      0.01    -0.01     0.02 1.00     1402     2019
g_size8:g_shape1                              0.01      0.01    -0.01     0.02 1.00     1388     1876
g_noise19:g_shape1                           -0.08      0.05    -0.17     0.02 1.00     1630     2372
g_noise38:g_shape1                           -0.31      0.10    -0.50    -0.12 1.00     1198     1775
g_size4:g_interpsEM001                       -0.00      0.01    -0.01     0.01 1.00     1788     2678
g_size8:g_interpsEM001                       -0.00      0.01    -0.01     0.01 1.00     1742     2333
g_size4:g_interpsEM003                       -0.03      0.01    -0.05    -0.01 1.00     2883     3180
g_size8:g_interpsEM003                       -0.03      0.01    -0.05    -0.01 1.00     3331     3447
g_size4:g_interpsEM006                       -0.00      0.01    -0.01     0.01 1.00     1883     2501
g_size8:g_interpsEM006                       -0.00      0.01    -0.02     0.01 1.00     1866     2443
g_size4:g_interpsEM008                       -0.01      0.01    -0.02     0.00 1.00     1859     1962
g_size8:g_interpsEM008                       -0.01      0.01    -0.02     0.00 1.00     1741     2535
g_size4:g_interpsEM016                       -0.00      0.01    -0.01     0.01 1.00     1827     2333
g_size8:g_interpsEM016                       -0.00      0.01    -0.02     0.01 1.00     1805     2871
g_size4:g_interpsEM018                       -0.01      0.01    -0.02     0.00 1.00     1880     2475
g_size8:g_interpsEM018                       -0.01      0.01    -0.02     0.00 1.00     1980     2950
g_size4:g_interpsEM026                       -0.00      0.00    -0.01     0.01 1.00     1469     2084
g_size8:g_interpsEM026                       -0.00      0.00    -0.01     0.01 1.00     1526     2115
g_size4:g_interpsEM028                       -0.03      0.01    -0.05    -0.02 1.00     2773     2991
g_size8:g_interpsEM028                       -0.03      0.01    -0.05    -0.01 1.00     2616     3045
g_noise19:g_interpsEM001                      0.27      0.07     0.13     0.40 1.00     2700     2266
g_noise38:g_interpsEM001                      0.67      0.08     0.52     0.82 1.00     1364     2183
g_noise19:g_interpsEM003                      0.74      0.07     0.61     0.87 1.00     2943     3190
g_noise38:g_interpsEM003                      1.06      0.07     0.91     1.21 1.00     1456     2031
g_noise19:g_interpsEM006                      0.27      0.05     0.18     0.36 1.00     2227     3211
g_noise38:g_interpsEM006                      0.27      0.10     0.08     0.46 1.00     1760     2440
g_noise19:g_interpsEM008                      0.34      0.05     0.24     0.44 1.00     2426     2741
g_noise38:g_interpsEM008                      0.35      0.10     0.16     0.54 1.00     1896     2461
g_noise19:g_interpsEM016                      0.27      0.05     0.17     0.36 1.00     2344     2741
g_noise38:g_interpsEM016                      0.27      0.10     0.08     0.46 1.00     1825     2282
g_noise19:g_interpsEM018                      0.34      0.05     0.25     0.44 1.00     2233     2968
g_noise38:g_interpsEM018                      0.35      0.10     0.16     0.54 1.00     1812     2291
g_noise19:g_interpsEM026                      0.23      0.05     0.12     0.33 1.00     2690     2935
g_noise38:g_interpsEM026                      0.68      0.07     0.54     0.83 1.00     1384     2042
g_noise19:g_interpsEM028                      0.72      0.05     0.62     0.81 1.00     2270     2954
g_noise38:g_interpsEM028                      1.08      0.07     0.94     1.23 1.00     1348     2182
g_shape1:g_interpsEM001                      -0.00      0.01    -0.01     0.01 1.00     1539     2280
g_shape1:g_interpsEM003                      -0.03      0.01    -0.05    -0.00 1.00     2865     3062
g_shape1:g_interpsEM006                      -0.00      0.01    -0.01     0.01 1.00     1589     2681
g_shape1:g_interpsEM008                      -0.01      0.01    -0.02     0.01 1.00     1522     2078
g_shape1:g_interpsEM016                      -0.00      0.01    -0.01     0.01 1.00     1430     1769
g_shape1:g_interpsEM018                      -0.00      0.01    -0.02     0.01 1.00     1698     2208
g_shape1:g_interpsEM026                      -0.00      0.00    -0.01     0.01 1.00     1234     1588
g_shape1:g_interpsEM028                      -0.03      0.01    -0.05    -0.01 1.00     2288     2822
g_size4:g_noise19:g_shape1                    0.05      0.08    -0.11     0.20 1.00     1827     2318
g_size8:g_noise19:g_shape1                    0.09      0.08    -0.07     0.23 1.00     1876     2705
g_size4:g_noise38:g_shape1                    0.20      0.16    -0.13     0.52 1.00     1363     2247
g_size8:g_noise38:g_shape1                    0.25      0.18    -0.11     0.60 1.00     1564     2280
g_size4:g_noise19:g_interpsEM001             -0.14      0.12    -0.38     0.09 1.00     3273     3292
g_size8:g_noise19:g_interpsEM001             -0.12      0.12    -0.35     0.12 1.00     3264     2876
g_size4:g_noise38:g_interpsEM001             -0.62      0.16    -0.92    -0.31 1.00     1928     2799
g_size8:g_noise38:g_interpsEM001             -0.62      0.16    -0.94    -0.31 1.00     2057     2845
g_size4:g_noise19:g_interpsEM003             -0.29      0.13    -0.54    -0.04 1.00     3742     3471
g_size8:g_noise19:g_interpsEM003             -0.27      0.13    -0.53    -0.02 1.00     3786     3511
g_size4:g_noise38:g_interpsEM003             -0.63      0.16    -0.93    -0.32 1.00     2118     2519
g_size8:g_noise38:g_interpsEM003             -0.64      0.16    -0.97    -0.31 1.00     2108     2999
g_size4:g_noise19:g_interpsEM006              0.00      0.08    -0.14     0.15 1.00     2345     3018
g_size8:g_noise19:g_interpsEM006              0.00      0.08    -0.14     0.15 1.00     2302     3230
g_size4:g_noise38:g_interpsEM006              0.01      0.17    -0.32     0.33 1.00     1908     3002
g_size8:g_noise38:g_interpsEM006              0.01      0.18    -0.34     0.36 1.00     2111     3320
g_size4:g_noise19:g_interpsEM008              0.01      0.08    -0.15     0.16 1.00     2695     3098
g_size8:g_noise19:g_interpsEM008              0.01      0.08    -0.14     0.16 1.00     2616     2985
g_size4:g_noise38:g_interpsEM008              0.02      0.17    -0.31     0.34 1.00     2068     3046
g_size8:g_noise38:g_interpsEM008              0.02      0.18    -0.34     0.36 1.00     1737     2940
g_size4:g_noise19:g_interpsEM016              0.00      0.08    -0.15     0.16 1.00     2492     3084
g_size8:g_noise19:g_interpsEM016              0.00      0.08    -0.15     0.16 1.00     2340     2803
g_size4:g_noise38:g_interpsEM016              0.00      0.17    -0.32     0.33 1.00     2020     2958
g_size8:g_noise38:g_interpsEM016              0.01      0.18    -0.34     0.37 1.00     1880     2847
g_size4:g_noise19:g_interpsEM018              0.01      0.08    -0.15     0.16 1.00     2434     3074
g_size8:g_noise19:g_interpsEM018              0.01      0.08    -0.15     0.16 1.00     2349     3113
g_size4:g_noise38:g_interpsEM018              0.01      0.17    -0.33     0.35 1.00     2016     2853
g_size8:g_noise38:g_interpsEM018              0.01      0.18    -0.33     0.37 1.00     2156     2506
g_size4:g_noise19:g_interpsEM026             -0.27      0.08    -0.43    -0.10 1.00     2808     2800
g_size8:g_noise19:g_interpsEM026             -0.26      0.08    -0.42    -0.10 1.00     2799     3323
g_size4:g_noise38:g_interpsEM026             -0.62      0.15    -0.92    -0.32 1.00     1976     3118
g_size8:g_noise38:g_interpsEM026             -0.63      0.16    -0.94    -0.30 1.00     2127     2985
g_size4:g_noise19:g_interpsEM028             -0.52      0.08    -0.68    -0.37 1.00     2534     3187
g_size8:g_noise19:g_interpsEM028             -0.52      0.08    -0.68    -0.36 1.00     2388     2779
g_size4:g_noise38:g_interpsEM028             -0.63      0.15    -0.93    -0.33 1.00     2122     2392
g_size8:g_noise38:g_interpsEM028             -0.64      0.17    -0.96    -0.31 1.00     2142     2613
g_size4:g_shape1:g_interpsEM001              -0.00      0.01    -0.02     0.02 1.00     1731     2260
g_size8:g_shape1:g_interpsEM001               0.00      0.01    -0.02     0.02 1.00     1685     2595
g_size4:g_shape1:g_interpsEM003               0.02      0.02    -0.01     0.05 1.00     2578     3251
g_size8:g_shape1:g_interpsEM003               0.02      0.01    -0.01     0.05 1.00     3315     3271
g_size4:g_shape1:g_interpsEM006               0.00      0.01    -0.01     0.02 1.00     1860     2766
g_size8:g_shape1:g_interpsEM006               0.00      0.01    -0.01     0.02 1.00     1884     2437
g_size4:g_shape1:g_interpsEM008               0.01      0.01    -0.01     0.02 1.00     1829     2303
g_size8:g_shape1:g_interpsEM008               0.01      0.01    -0.01     0.02 1.00     1815     2713
g_size4:g_shape1:g_interpsEM016               0.00      0.01    -0.01     0.02 1.00     1646     2424
g_size8:g_shape1:g_interpsEM016               0.00      0.01    -0.01     0.02 1.00     1854     2455
g_size4:g_shape1:g_interpsEM018               0.00      0.01    -0.01     0.02 1.00     1941     2809
g_size8:g_shape1:g_interpsEM018               0.01      0.01    -0.01     0.02 1.00     1776     2575
g_size4:g_shape1:g_interpsEM026               0.00      0.01    -0.01     0.01 1.00     1413     2100
g_size8:g_shape1:g_interpsEM026               0.00      0.01    -0.01     0.01 1.00     1411     2023
g_size4:g_shape1:g_interpsEM028               0.03      0.01     0.00     0.05 1.00     2711     2716
g_size8:g_shape1:g_interpsEM028               0.03      0.01     0.00     0.05 1.00     2465     3094
g_noise19:g_shape1:g_interpsEM001            -0.17      0.09    -0.36     0.00 1.00     3156     2669
g_noise38:g_shape1:g_interpsEM001            -0.02      0.11    -0.23     0.19 1.00     1330     2094
g_noise19:g_shape1:g_interpsEM003            -0.09      0.09    -0.27     0.10 1.00     2922     3141
g_noise38:g_shape1:g_interpsEM003             0.01      0.11    -0.21     0.21 1.00     1340     2032
g_noise19:g_shape1:g_interpsEM006             0.01      0.07    -0.13     0.14 1.00     2113     2399
g_noise38:g_shape1:g_interpsEM006             0.00      0.14    -0.27     0.28 1.00     1653     2345
g_noise19:g_shape1:g_interpsEM008             0.00      0.07    -0.14     0.14 1.00     2445     2776
g_noise38:g_shape1:g_interpsEM008             0.00      0.14    -0.27     0.27 1.00     1716     2553
g_noise19:g_shape1:g_interpsEM016             0.01      0.07    -0.13     0.13 1.00     2222     2877
g_noise38:g_shape1:g_interpsEM016             0.00      0.14    -0.27     0.26 1.00     1787     2431
g_noise19:g_shape1:g_interpsEM018             0.00      0.07    -0.13     0.14 1.00     2196     2813
g_noise38:g_shape1:g_interpsEM018             0.01      0.13    -0.26     0.27 1.00     1763     2292
g_noise19:g_shape1:g_interpsEM026            -0.23      0.07    -0.37    -0.09 1.00     2713     3183
g_noise38:g_shape1:g_interpsEM026            -0.02      0.11    -0.23     0.18 1.00     1293     2165
g_noise19:g_shape1:g_interpsEM028            -0.25      0.07    -0.39    -0.12 1.00     2108     2645
g_noise38:g_shape1:g_interpsEM028             0.01      0.11    -0.20     0.21 1.00     1309     1873
g_size4:g_noise19:g_shape1:g_interpsEM001     0.22      0.17    -0.10     0.55 1.00     3625     3321
g_size8:g_noise19:g_shape1:g_interpsEM001     0.17      0.17    -0.15     0.50 1.00     3720     3208
g_size4:g_noise38:g_shape1:g_interpsEM001     0.09      0.22    -0.34     0.52 1.00     1966     2543
g_size8:g_noise38:g_shape1:g_interpsEM001     0.07      0.23    -0.39     0.53 1.00     2229     3008
g_size4:g_noise19:g_shape1:g_interpsEM003     0.13      0.18    -0.22     0.48 1.00     3374     2694
g_size8:g_noise19:g_shape1:g_interpsEM003     0.08      0.18    -0.27     0.43 1.00     3682     3288
g_size4:g_noise38:g_shape1:g_interpsEM003     0.06      0.22    -0.37     0.49 1.00     1976     2786
g_size8:g_noise38:g_shape1:g_interpsEM003     0.04      0.23    -0.42     0.50 1.00     2167     2736
g_size4:g_noise19:g_shape1:g_interpsEM006    -0.01      0.11    -0.22     0.21 1.00     2307     3047
g_size8:g_noise19:g_shape1:g_interpsEM006    -0.01      0.11    -0.22     0.22 1.00     2430     2951
g_size4:g_noise38:g_shape1:g_interpsEM006    -0.00      0.23    -0.44     0.46 1.00     1847     2472
g_size8:g_noise38:g_shape1:g_interpsEM006    -0.01      0.25    -0.50     0.50 1.00     1990     2807
g_size4:g_noise19:g_shape1:g_interpsEM008    -0.00      0.11    -0.22     0.21 1.00     2682     2921
g_size8:g_noise19:g_shape1:g_interpsEM008    -0.01      0.11    -0.23     0.21 1.00     2622     3064
g_size4:g_noise38:g_shape1:g_interpsEM008    -0.01      0.23    -0.47     0.44 1.00     1952     2815
g_size8:g_noise38:g_shape1:g_interpsEM008    -0.01      0.25    -0.50     0.49 1.00     1991     2508
g_size4:g_noise19:g_shape1:g_interpsEM016    -0.00      0.11    -0.21     0.22 1.00     2377     2799
g_size8:g_noise19:g_shape1:g_interpsEM016    -0.01      0.11    -0.22     0.21 1.00     2384     2970
g_size4:g_noise38:g_shape1:g_interpsEM016     0.00      0.24    -0.45     0.47 1.00     1952     2630
g_size8:g_noise38:g_shape1:g_interpsEM016    -0.01      0.25    -0.50     0.49 1.00     2229     2970
g_size4:g_noise19:g_shape1:g_interpsEM018    -0.00      0.11    -0.22     0.22 1.00     2448     2789
g_size8:g_noise19:g_shape1:g_interpsEM018    -0.01      0.11    -0.22     0.21 1.00     2548     3101
g_size4:g_noise38:g_shape1:g_interpsEM018    -0.01      0.23    -0.46     0.46 1.00     1933     2782
g_size8:g_noise38:g_shape1:g_interpsEM018    -0.01      0.25    -0.51     0.49 1.00     2123     2822
g_size4:g_noise19:g_shape1:g_interpsEM026     0.26      0.12     0.02     0.50 1.00     2819     3091
g_size8:g_noise19:g_shape1:g_interpsEM026     0.22      0.12    -0.01     0.45 1.00     2832     3215
g_size4:g_noise38:g_shape1:g_interpsEM026     0.09      0.21    -0.32     0.50 1.00     1800     2881
g_size8:g_noise38:g_shape1:g_interpsEM026     0.07      0.23    -0.40     0.52 1.00     1929     2899
g_size4:g_noise19:g_shape1:g_interpsEM028     0.28      0.11     0.07     0.50 1.00     2338     3033
g_size8:g_noise19:g_shape1:g_interpsEM028     0.25      0.11     0.03     0.48 1.00     2640     3134
g_size4:g_noise38:g_shape1:g_interpsEM028     0.06      0.22    -0.38     0.49 1.00     1891     2663
g_size8:g_noise38:g_shape1:g_interpsEM028     0.04      0.24    -0.43     0.50 1.00     2248     3029
sigma_g_size4                                -0.05      0.16    -0.35     0.26 1.00     1116     1930
sigma_g_size8                                 0.04      0.16    -0.28     0.36 1.01      987     1802
sigma_g_noise19                               2.31      0.16     1.99     2.61 1.01      974     1603
sigma_g_noise38                               3.00      0.16     2.69     3.30 1.00     1012     1748
sigma_g_interpsEM001                         -0.43      0.16    -0.75    -0.11 1.00     1298     1796
sigma_g_interpsEM003                          0.68      0.16     0.37     1.00 1.01     1217     2093
sigma_g_interpsEM006                         -0.45      0.16    -0.77    -0.15 1.00     1278     2082
sigma_g_interpsEM008                         -0.29      0.16    -0.61     0.03 1.01     1339     1766
sigma_g_interpsEM016                         -0.45      0.16    -0.77    -0.13 1.00     1207     2388
sigma_g_interpsEM018                         -0.21      0.16    -0.53     0.11 1.00     1406     1965
sigma_g_interpsEM026                         -2.00      0.16    -2.32    -1.68 1.00     1255     2108
sigma_g_interpsEM028                          0.48      0.16     0.17     0.80 1.00     1322     2294
sigma_g_size4:g_noise19                       0.27      0.22    -0.15     0.71 1.00     1179     2029
sigma_g_size8:g_noise19                       0.18      0.22    -0.26     0.61 1.01     1047     1969
sigma_g_size4:g_noise38                       0.38      0.22    -0.05     0.79 1.00     1265     2061
sigma_g_size8:g_noise38                       0.38      0.22    -0.05     0.82 1.00     1132     2243
sigma_g_size4:g_interpsEM001                  0.01      0.23    -0.42     0.46 1.00     1693     2722
sigma_g_size8:g_interpsEM001                 -0.06      0.23    -0.53     0.38 1.00     1469     2319
sigma_g_size4:g_interpsEM003                  0.01      0.23    -0.43     0.46 1.00     1596     2333
sigma_g_size8:g_interpsEM003                 -0.08      0.23    -0.53     0.37 1.01     1306     2399
sigma_g_size4:g_interpsEM006                  0.22      0.22    -0.22     0.66 1.00     1546     2597
sigma_g_size8:g_interpsEM006                  0.19      0.23    -0.26     0.63 1.00     1518     2727
sigma_g_size4:g_interpsEM008                  0.20      0.23    -0.24     0.65 1.00     1592     2628
sigma_g_size8:g_interpsEM008                  0.13      0.23    -0.32     0.58 1.00     1415     2216
sigma_g_size4:g_interpsEM016                  0.22      0.23    -0.21     0.67 1.00     1678     2762
sigma_g_size8:g_interpsEM016                  0.17      0.23    -0.29     0.62 1.00     1436     2024
sigma_g_size4:g_interpsEM018                  0.17      0.22    -0.28     0.62 1.00     1751     2600
sigma_g_size8:g_interpsEM018                  0.09      0.23    -0.37     0.55 1.00     1440     2310
sigma_g_size4:g_interpsEM026                 -2.55      0.24    -3.01    -2.07 1.00     1577     2737
sigma_g_size8:g_interpsEM026                 -2.28      0.24    -2.75    -1.83 1.00     1490     1915
sigma_g_size4:g_interpsEM028                  0.03      0.22    -0.40     0.47 1.00     1760     2737
sigma_g_size8:g_interpsEM028                 -0.05      0.23    -0.50     0.39 1.00     1361     2713
sigma_g_noise19:g_interpsEM001                0.93      0.23     0.48     1.37 1.00     1484     2891
sigma_g_noise38:g_interpsEM001               -0.37      0.23    -0.83     0.06 1.00     1595     2607
sigma_g_noise19:g_interpsEM003               -0.20      0.23    -0.64     0.26 1.01     1373     2252
sigma_g_noise38:g_interpsEM003               -1.48      0.22    -1.91    -1.04 1.00     1499     2653
sigma_g_noise19:g_interpsEM006                0.44      0.22     0.02     0.87 1.00     1455     2618
sigma_g_noise38:g_interpsEM006                0.46      0.22     0.03     0.90 1.00     1475     2547
sigma_g_noise19:g_interpsEM008                0.31      0.23    -0.15     0.76 1.00     1569     2775
sigma_g_noise38:g_interpsEM008                0.30      0.23    -0.16     0.74 1.00     1455     2538
sigma_g_noise19:g_interpsEM016                0.45      0.22     0.01     0.90 1.00     1322     2012
sigma_g_noise38:g_interpsEM016                0.45      0.23     0.01     0.89 1.00     1493     2343
sigma_g_noise19:g_interpsEM018                0.22      0.23    -0.22     0.66 1.00     1526     2979
sigma_g_noise38:g_interpsEM018                0.22      0.23    -0.22     0.68 1.00     1693     2543
sigma_g_noise19:g_interpsEM026                2.13      0.23     1.67     2.57 1.00     1403     2437
sigma_g_noise38:g_interpsEM026                1.20      0.23     0.74     1.64 1.00     1413     2468
sigma_g_noise19:g_interpsEM028               -0.49      0.22    -0.93    -0.05 1.01     1420     2590
sigma_g_noise38:g_interpsEM028               -1.28      0.22    -1.72    -0.85 1.00     1598     2600
sigma_g_size4:g_noise19:g_interpsEM001        0.18      0.32    -0.43     0.80 1.00     1849     2788
sigma_g_size8:g_noise19:g_interpsEM001        0.26      0.32    -0.35     0.92 1.00     1592     2213
sigma_g_size4:g_noise38:g_interpsEM001        0.77      0.31     0.14     1.38 1.00     1984     2878
sigma_g_size8:g_noise38:g_interpsEM001        0.80      0.32     0.17     1.41 1.00     1755     3107
sigma_g_size4:g_noise19:g_interpsEM003        0.34      0.32    -0.27     0.96 1.00     1650     2363
sigma_g_size8:g_noise19:g_interpsEM003        0.45      0.32    -0.18     1.07 1.01     1412     2486
sigma_g_size4:g_noise38:g_interpsEM003        0.81      0.32     0.18     1.41 1.00     1968     2778
sigma_g_size8:g_noise38:g_interpsEM003        0.87      0.32     0.23     1.51 1.00     1590     2987
sigma_g_size4:g_noise19:g_interpsEM006       -0.22      0.32    -0.82     0.40 1.00     1685     2495
sigma_g_size8:g_noise19:g_interpsEM006       -0.18      0.32    -0.79     0.44 1.00     1638     2515
sigma_g_size4:g_noise38:g_interpsEM006       -0.23      0.32    -0.87     0.38 1.00     1894     2740
sigma_g_size8:g_noise38:g_interpsEM006       -0.20      0.32    -0.82     0.43 1.00     1653     2610
sigma_g_size4:g_noise19:g_interpsEM008       -0.20      0.32    -0.84     0.42 1.00     1771     2433
sigma_g_size8:g_noise19:g_interpsEM008       -0.14      0.33    -0.79     0.51 1.00     1632     2565
sigma_g_size4:g_noise38:g_interpsEM008       -0.20      0.32    -0.83     0.41 1.00     1681     2901
sigma_g_size8:g_noise38:g_interpsEM008       -0.14      0.33    -0.77     0.49 1.00     1643     2886
sigma_g_size4:g_noise19:g_interpsEM016       -0.21      0.32    -0.85     0.40 1.00     1675     2419
sigma_g_size8:g_noise19:g_interpsEM016       -0.17      0.32    -0.79     0.44 1.00     1540     2710
sigma_g_size4:g_noise38:g_interpsEM016       -0.22      0.32    -0.83     0.41 1.00     1915     3006
sigma_g_size8:g_noise38:g_interpsEM016       -0.18      0.33    -0.81     0.44 1.00     1638     2458
sigma_g_size4:g_noise19:g_interpsEM018       -0.17      0.32    -0.79     0.45 1.00     1743     2912
sigma_g_size8:g_noise19:g_interpsEM018       -0.10      0.33    -0.74     0.54 1.00     1589     2421
sigma_g_size4:g_noise38:g_interpsEM018       -0.18      0.32    -0.79     0.44 1.00     2055     3174
sigma_g_size8:g_noise38:g_interpsEM018       -0.10      0.32    -0.74     0.53 1.00     1713     2675
sigma_g_size4:g_noise19:g_interpsEM026        2.55      0.33     1.88     3.19 1.00     1574     2988
sigma_g_size8:g_noise19:g_interpsEM026        2.30      0.33     1.64     2.93 1.00     1662     2485
sigma_g_size4:g_noise38:g_interpsEM026        3.32      0.33     2.70     4.00 1.00     1838     2784
sigma_g_size8:g_noise38:g_interpsEM026        3.02      0.32     2.41     3.66 1.00     1540     2348
sigma_g_size4:g_noise19:g_interpsEM028        0.05      0.31    -0.54     0.67 1.00     1773     2936
sigma_g_size8:g_noise19:g_interpsEM028        0.15      0.32    -0.47     0.77 1.00     1522     2634
sigma_g_size4:g_noise38:g_interpsEM028        0.79      0.32     0.18     1.42 1.00     2101     2990
sigma_g_size8:g_noise38:g_interpsEM028        0.83      0.32     0.19     1.46 1.00     1666     2713

Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).


3 Likes



2 Likes

Wow the speedups look quite nice especially with 100 iterations! Tagging @wds15 I am sure here is interested.

2 Likes

Eek! Exciting!
A couple questions:

  1. seed is set to to 54647 in the beginning of the vignette, and the initial models are ran without specifying a seed. Then the benchmark_threading function calls update without a seed for the scaling_model, followed by another update with seed=1234 in the run_benchmark function. Are all these right? This might explain the weird dips in the beginning of the graphs.
  2. The same for iter in the update for the scaling_model.
  3. In the first pic, were the chunks run serially on the one core?

Good point about the seeds. @wds15 set the seeds so perhaps he knows more.

on second thought, the seed and iter look fine.