I have put together a model in brms to deal with a kind of esoteric situation. I am modeling the relationship between a phenotype (continuous outcome) and some microbiome data, at the seed level. There are repeated measures of seeds from the same maternal plant. However, the interesting part is that phenotype and microbiome sampling are destructive so one seed cannot be measured for both. Thus, I set up a missing data model with random intercepts for maternal plant on both the trait and microbiome side, to estimate the missing phenotype data for the individuals that only have microbiome data measured, and estimate the missing microbiome data for individuals that only have phenotype data measured. Then model the phenotype, including both missing and nonmissing, as a function of the microbiome. It works well enough with a small simulated dataset. So far so good.
The model formula I used, for a situation with 5 x variables, is bf(mvbind(Xmiss1, Xmiss2, Xmiss3, Xmiss4, Xmiss5) | mi() ~ 0 + (1||maternal_id)) + bf(ymiss | mi() ~ mi(Xmiss1) + mi(Xmiss2) + mi(Xmiss3) + mi(Xmiss4) + mi(Xmiss5) + (1||maternal_id)) + set_rescor(FALSE)
Now for my question. The model I have constructed with brms estimates a separate sd(Intercept) parameter, and a separate sigma parameter, for each x variable (i.e. taxon in the microbiome dataset). I would like to fit a more parsimonious model by constraining those parameters to be the same for all x variables (taxa). I have had some success modifying the underlying Stan code from the brms model to do this, but I was curious if it is possible using the existing brms multivariate syntax.
Any help would be greatly appreciated – I am always very grateful for the helpful advice I get on this forum!
Simulate data
library(mvtnorm)
library(brms)
options(mc.cores = 4, brms.backend = 'cmdstanr', brms.file_refit = 'on_change')
# Start with manageable number of taxa.
n_mothers <- 10
n_taxa <- 5
offspring_per_mother <- 10 # 5 will be retained for traits, 5 for microbiome
set.seed(1)
X_maternal <- rmvnorm(n_mothers, mean = rep(0, n_taxa), sigma = diag(n_taxa))
sigma_maternal <- cov(X_maternal)
# Coefficients indicating which taxa predict the outcome.
# We will not include any interaction effect.
beta <- c(5, -5, 0, 0, 0)
# To get offspring microbiome, take multivariate normal draws from the mean vector for each mother (rows of X_maternal)
X_offspring <- apply(X_maternal, 1, function(Xi) rmvnorm(offspring_per_mother, mean = Xi, sigma = sigma_maternal), simplify = FALSE)
# Use regression coefficients (beta) to get value for offspring trait, plus noise
y_offspring <- lapply(X_offspring, function(Xoi) Xoi %*% beta + rnorm(offspring_per_mother, 0, 1))
# Combine together
dt <- data.frame(
maternal_id = factor(rep(1:n_mothers, each = offspring_per_mother)),
offspring_id = 1:offspring_per_mother,
do.call(rbind, X_offspring),
y = do.call(c, y_offspring)
)
# Within each mother, set half of the values to be missing for x, and the other half for y.
xmiss <- lapply(1:nrow(dt), function(i) {
if (dt[i, 'offspring_id'] %in% 1:(offspring_per_mother/2)) {
setNames(dt[i, paste0('X', 1:n_taxa)], paste0('Xmiss',1:n_taxa))
} else {
setNames(rep(NA, n_taxa), paste0('Xmiss', 1:n_taxa))
}
})
dt <- cbind(dt, do.call(rbind, xmiss))
dt$ymiss <- ifelse(dt$offspring_id %in% 1:(offspring_per_mother/2), NA, dt$y)
Fit model
modmv_miss <- brm(
bf(mvbind(Xmiss1, Xmiss2, Xmiss3, Xmiss4, Xmiss5) | mi() ~ 0 + (1||maternal_id)) + bf(ymiss | mi() ~ mi(Xmiss1) + mi(Xmiss2) + mi(Xmiss3) + mi(Xmiss4) + mi(Xmiss5) + (1||maternal_id)) + set_rescor(FALSE),
prior = c(
prior(gamma(1, 1), class = sd, resp = Xmiss1),
prior(gamma(1, 1), class = sd, resp = Xmiss2),
prior(gamma(1, 1), class = sd, resp = Xmiss3),
prior(gamma(1, 1), class = sd, resp = Xmiss4),
prior(gamma(1, 1), class = sd, resp = Xmiss5),
prior(gamma(1, 1), class = sd, resp = ymiss),
prior(gamma(1, 1), class = sigma, resp = Xmiss1),
prior(gamma(1, 1), class = sigma, resp = Xmiss2),
prior(gamma(1, 1), class = sigma, resp = Xmiss3),
prior(gamma(1, 1), class = sigma, resp = Xmiss4),
prior(gamma(1, 1), class = sigma, resp = Xmiss5),
prior(gamma(1, 1), class = sigma, resp = ymiss),
prior(normal(0, 5), class = b, resp = ymiss) # Only y carries any fixed effects.
),
data = dt,
chains = 4, iter = 4000, warmup = 2000
)
Model summary output
Some mixing is not perfect but the point here is to just show what parameters are estimated. As you can see below, we have five different sd(X_Intercept)
parameters and five different sigma_X
parameters. Using brms syntax, can either or both of those groups of five be constrained to be the same?
Multilevel Hyperparameters:
~maternal_id (Number of levels: 10)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Xmiss1_Intercept) 1.04 0.31 0.35 1.67 1.06 43 15
sd(Xmiss2_Intercept) 0.36 0.11 0.19 0.63 1.03 731 498
sd(Xmiss3_Intercept) 0.73 0.21 0.40 1.19 1.03 123 1789
sd(Xmiss4_Intercept) 1.13 0.31 0.62 1.91 1.03 1448 1510
sd(Xmiss5_Intercept) 0.88 0.30 0.47 1.74 1.09 32 31
sd(ymiss_Intercept) 0.87 0.76 0.03 2.88 1.01 384 2480
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
ymiss_Intercept 0.03 1.30 -2.61 2.50 1.01 325 1894
ymiss_miXmiss1 3.42 1.57 0.16 6.08 1.05 73 389
ymiss_miXmiss2 -4.78 3.51 -11.33 2.63 1.02 141 25
ymiss_miXmiss3 0.21 2.13 -4.09 4.09 1.03 191 592
ymiss_miXmiss4 -1.01 1.79 -4.23 2.44 1.07 62 60
ymiss_miXmiss5 -1.68 1.82 -5.22 1.99 1.02 176 592
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_Xmiss1 0.95 0.11 0.77 1.17 1.03 101 180
sigma_Xmiss2 0.39 0.05 0.31 0.48 1.07 39 303
sigma_Xmiss3 0.68 0.08 0.55 0.87 1.04 66 89
sigma_Xmiss4 1.25 0.15 1.01 1.59 1.02 211 627
sigma_Xmiss5 0.75 0.08 0.62 0.95 1.03 664 1356
sigma_ymiss 1.43 0.93 0.20 3.74 1.21 15 11