Huge error bars with brms_multiple

Hi all,

I am running brms_multiple in a strange way that may be the wrong intent. I am basically trying to subsample my data over and over again to check that my original model isn’t a product of pseudo-replication. Long story short: one of the factors has many levels where n=1 for some of them and I am comparing it to another factor where n=10 for example. Removing the n=1 levels is not an option. So what the reviewers want me to do is to subsample the data over and over again to equally pick out one sample so n=1 for both factors and bootstrap this to make sure the model works.

Question 1: is it ok to use brms_multiple for this? Basically running a model over and over again by comparing 2 random samples to each other in a dataset of 400 samples.

Here is my forloop to basically slice my data into subsampled subsets and then I run all these subsets into brms_multiple. The errors bars are HUGE every time which is understandeable but the actual values stay quite similar which tells me there should be a way to minimize the error bars…
Any info would be appreicated!

df1 ← lapply(1:999, function(x) sub %>% group_by(dietAandB) %>% sample_n(1, replace =FALSE) |> ungroup())

fit1 ← brm_multiple(similarity ~ dietAandB,
data = df1,
family = “beta”, backend=“cmdstanr”, core=4,threads = 2)


And heres the output for doing it just 9 times but the error bars are similarly big for 999 times. Honestly increasing the repititions doesn’t change the error size at all.

Family: beta
Links: mu = logit; phi = identity
Formula: similarity ~ dietAandB
Data: df1 (Number of observations: 8)
Draws: 36 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 36000

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept -1.11 0.80 -2.89 0.16 1.35 82
dietAandBFishcarnivore:Mammalcarnivore -0.73 1.13 -3.03 1.54 1.27 97
dietAandBFishcarnivore:Mammalherbivore -1.39 1.26 -4.00 1.08 1.23 108
dietAandBFishherbivore:Mammalcarnivore -0.95 1.12 -3.22 1.32 1.19 126
dietAandBFishherbivore:Mammalherbivore -0.77 1.06 -2.86 1.41 1.19 128
dietAandBMammalcarnivore:Mammalcarnivore 0.68 0.96 -1.04 2.74 1.24 108
dietAandBMammalcarnivore:Mammalherbivore -1.29 1.40 -4.16 1.35 1.42 73
dietAandBMammalherbivore:Mammalherbivore -0.32 1.07 -2.35 1.90 1.28 94

Can you explain exactly what aspect of the output is unexpected, and why it is unexpected?

So when I run this same model but just on one model with thousands of samples I get very small error bars. When I subset the data into small chunks and combine them with brms_multiple I get very large error bars. I was expecting (although I acknowledge this is more hopeful than fully understanding the model) that the error bars would be smaller, especially if I increase the number of models I put into brms_multiple.

I’ve inspected the iterations myself and they generally give similar outputs in terms of the biological interpretation so there isn’t that much variation between models. Although the error bars remain huge. It just the averaged points that are very similar across models. Sorry if this is hard to understand. I can draw a diagram if needed.

TL:dr I basically am wondering if adding more models to brms_multiple can increase its confidence and thus lead to smaller error bars.

In general, the more data you feed to a model, the tighter the model can constrain the parameters. When you don’t feed it much data, the parameters end up less constrained (bigger error bars). What you are doing, as far as I’ve understood, is feeding a bunch of models with not much data per model, and then pushing all of the resulting uncertainty forward into the final posterior. It’s expected that this procedure would produce much wider posterior distributions than showing the model all of the data.

The reason you are showing your model only a subset of the data at a time, to my understanding, is because you or a reviewer are concerned that the full dataset violates independence assumptions and/or gives unnaturally high weight to some species, so that the tightening of the posterior uncertainty is undesirable. That is, the posterior might be getting too strongly “tightened” to data points that aren’t actually as informative as they seem to be in the data that you pass to the model. By subsetting the data, you avoid this tightening. But how heavily to cull the data in the subsetting step is not an exact science. An alternative to sparse subsetting is to build in the hierarchical structure that properly captures the non-independence of the points. Unfortunately, there is no general-purpose solution that captures the hierarchical dependency structure of pairwise dissimilarity data, and so when we consult the literature we often find the recommendation to subset.

Thank you for the detailed response. I will look into a hierarchical solution, but as you’ve said I haven’t seen much for microbiome stuff.
Your response will at least help argue with reviewers why we did the analyses the way we did. Thanks!

Think of brms_multiple as stacking posteriors on top of each other to create a combined posterior. So if you have 10 posteriors that are all equivalent to a standard normal, stacking them on top of each other will just get another standard normal. And if you have 1,000 posteriors that are all standard normals, you’re still left with a standard normal when you stack them. Simulating more datasets will produce more stable estimates but will not decrease uncertainty. This is useful when summarizing uncertainty across multiply-imputed datasets.

This is distinct from Bayesian updating, which it sounds like you have in mind.

Hi Simon,

Would that be update.brmsfit? And off of that would an example code look like this?

#make a list of subsetted dataframes that are randomly subsampled from same dataframe
df ← lapply(1:999, function(x) sub %>% group_by(dietAandB) %>% sample_n(1, replace =FALSE) |> ungroup())

#run brms on 1st df from the list of df's I made
fit1 ← brm(similarity ~ dietAandB,
data = df1[[1]],
family = “beta”, backend=“cmdstanr”, core=4,threads = 2)

#update 1st fit with the next df from the list
fit2  ← update(fit1, similarity ~ dietAandB,
newdata = df[[2]],
family = “beta”, backend=“cmdstanr”, core=4,threads = 2)

Is this correct even though the model formulas are staying exactly the same? Its just the samples within the dataframe that are changing.
Any info would be appreciated!

I don’t think update.brmsfit works that way, but I’m not well-versed in brms. I don’t want to take you too off-track with the “Bayesian updating” comment, though. I don’t quite understand the original motivation for the subsampling and bootstrapping. That was just my best guess.

Yeah I think you are right. basically, these reviewers are upset about the large differences in sample size between mammals and fishes. Because some species only have an n=1, it is problematic to simply filter them out since they are still important. They suggest rerunning my model by only using 1 sample per species and just bootstrapping over and over to make sure I get the same result as before.

Their goal definitely is not to make sure you get the same result as before. If it were, then you could just use your original result. The goal is to bootstrap while throwing out lots of data and pushing the bootstrap uncertainty forward. The problem is that the amount of data that needs to be thrown out to satisfy them seems to be so large that you lose your ability to constrain the parameters well, and you get huge error bars. Without more information, it’s hard to say whether they’ve recommended a silly procedure that throws out too much data, or whether they’ve recommended an appropriately conservative procedure and failing to throw out that data yields nominal credible intervals that are too tight because they artifically inflate the degrees of freedom in the analysis. That is, it’s possible that “super wide uncertainty” really is the correct answer given the data at hand. Or it’s possible that the data really should be sufficient to constrain things better, and better procedures are available for getting a well calibrated posterior.

I agree that the reviewers are being overly pesky. They do have a point that the “experiment” was poorly designed in terms of sample size. But the tests I am using were designed to handle unequal sample sizes so honestly I don’t get their concern. I appreciate the feedback and I might even include some of it in my response to the reviewers that bayesian linear modeling with brms wasn’t built to subsample ridiculously low sample sizes over and over. Thanks for the feedback!