Multi-membership random effects model for distance matrices

I am working on a model using brms to study ecosystem stability (response variable: cv_functioning_inverse ). The data is derived from pairwise distances between sites, and I aim to account for the shared contributions of sites in these pairwise relationships but also try to correct for non-independence of the observation.

Data Context:

  • Response Variable: cv_functioning_inverse (scaled inverse coefficient of variation for ecosystem functioning).
  • Predictors:
    • beta_turnover_abund : Beta diversity.
    • multivariate_dispersion : Environmental distance.
  • Distance Matrix Context:
    • Sites are represented in pairs (site_1 , site_2 ), and the response variable reflects ecosystem stability between these pairs.
    • Due to the symmetric nature of the distance matrix, sites can appear as site_1 or site_2 randomly.
    • To avoid introducing bias, I randomised the order of site_1 and site_2 during sampling.

Here is a simplified mock example of the dataset:

site_1 site_2 beta_turnover_abund multivariate_dispersion cv_functioning_inverse
A B 0.5 0.8 0.6
B E 0.7 0.9 0.4
B A 0.5 0.8 0.6
…

(Note: The order of site_1 and site_2 is randomised to avoid bias in directional relationships.)

Model:

model_formula_mm <- bf(
  scale(cv_functioning_inverse) ~ 
  scale(beta_turnover_abund) * 
  scale(multivariate_dispersion) +
    (scale(beta_turnover_abund) * 
    scale(multivariate_dispersion) | mm(site_1, site_2))
)

fitted_model_mm <- brm(
  formula = model_formula_mm,
  data = data,
  prior = priors,
  chains = 4,
  iter = 2000,
  cores = 4,
  seed = 123  # Set seed for reproducibility
)

Problem:

When running posterior predictive checks (pp_check ), the predictions look off compared to the observed data.

Questions:

  1. Model Appropriateness:
  • Does using mm(site_1, site_2) for the random effects structure appropriately account for the shared contributions of site_1 and site_2 ?
  • Is there a better way to model pairwise relationships like this?
  • I would like to keep a regression style model as most research is using multiple regressions or correlation analysis.
  1. Odd Results:
  • What diagnostics should I focus on to troubleshoot poor posterior predictive checks? Could the issue arise from:
    • Mis-specified priors?
    • Over-complex random effects structure?
    • Scaling of predictors and response?

Additional Details:

  • Number of sites: 43
  • Number of pairwise observations: ~4200 (randomly drawn)

I would greatly appreciate any insights on improving model fit and diagnosing issues with posterior predictive checks. Thank you in advance!

Additional Output:

> loo_results <- loo(fitted_model_mm, reloo = TRUE)
> loo_results

Computed from 4000 by 4200 log-likelihood matrix.

         Estimate    SE
elpd_loo  -4199.1  61.3
p_loo       163.2   7.0
looic      8398.2 122.7
------
MCSE of elpd_loo is 0.3.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 1.6]).

All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.

> fitted_model_mm
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: scale(cv_functioning_inverse) ~ scale(beta_turnover_abund) * scale(multivariate_dispersion) + (scale(beta_turnover_abund) * scale(multivariate_dispersion) | mm(site_1, site_2)) 
   Data: data (Number of observations: 4200) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~mmsite_1site_2 (Number of levels: 43) 
                                                                                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)                                                                               0.80      0.10     0.63     1.03 1.00     1083     1492
sd(scalebeta_turnover_abund)                                                                0.84      0.10     0.67     1.07 1.00     1216     1974
sd(scalemultivariate_dispersion)                                                            0.65      0.09     0.50     0.86 1.01     1314     2307
sd(scalebeta_turnover_abund:scalemultivariate_dispersion)                                   0.80      0.11     0.61     1.04 1.00     1047     1857
cor(Intercept,scalebeta_turnover_abund)                                                    -0.03      0.15    -0.34     0.27 1.00     1178     1512
cor(Intercept,scalemultivariate_dispersion)                                                 0.10      0.16    -0.23     0.41 1.00     1392     1961
cor(scalebeta_turnover_abund,scalemultivariate_dispersion)                                  0.01      0.16    -0.30     0.34 1.00     1184     1299
cor(Intercept,scalebeta_turnover_abund:scalemultivariate_dispersion)                       -0.58      0.11    -0.76    -0.34 1.00     1566     2027
cor(scalebeta_turnover_abund,scalebeta_turnover_abund:scalemultivariate_dispersion)         0.28      0.15    -0.05     0.56 1.00     1322     1944
cor(scalemultivariate_dispersion,scalebeta_turnover_abund:scalemultivariate_dispersion)     0.05      0.19    -0.31     0.41 1.00     1213     1846

Regression Coefficients:
                                                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                                                -0.13      0.13    -0.38     0.11 1.01      840     1264
scalebeta_turnover_abund                                  0.17      0.13    -0.10     0.42 1.01     1084     1128
scalemultivariate_dispersion                             -0.53      0.11    -0.73    -0.32 1.00     1012     1610
scalebeta_turnover_abund:scalemultivariate_dispersion     0.20      0.13    -0.05     0.47 1.00     1114     1597

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.64      0.01     0.63     0.66 1.00     4705     2961

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).


1 Like

From your plots, it seems more likely to be an underpowered likelihood. I don’t now brms so can’t really help at that level. You need to find a likelihood that has parameters that will let you reproduce the data for observed y.

1 Like

I don’t fully understand what your response variable is or how it’s computed, so be aware that some of what I say might not make sense in your specific context.

First, I agree with what @Bob_Carpenter says above: it looks like your model is failing to capture some features of the data, and the way forward is often to incorporate more covariates, different functional forms, and/or different response distributions, which is what Bob means by “finding a likelihood that has parameters that will let you reproduce the data”.

Second, as you allude to in your post, problems arise when dealing with certain forms of pairwise data due to partial non-independence in the response. The canonical example in ecology is when the response represents a dissimilarity measure between two communities, but maybe your response variable suffers from some of the same problems. Since I don’t know what your response variable means or how it’s computed, I’ll focus comments here on the case where the response is a dissimilarity measure. The issue is that the generative models that we might right down for a set of dissimilarities tend to admit configurations of the response data that are logically impossible. For example, if community A is highly similar to both community B and community C, then it is impossible that the dissimilarity between B and C would be large. Even if the proposed generative model doesn’t yield logically impossible configurations, it may concentrate nearly all of the prior mass into an unintentionally limited region of the space of all allowable configurations. When we treat dissimilarity values as responses in a regression, it is challenging to account for this dependency. I haven’t closely followed the ecological literature on the topic over the last 2-3 years, but as far as I know we still don’t have a good generative model that we can write down to directly model dissimilarity data in a way that respects the dependencies inherent in the data. Statistical alternatives to a generative model have included null models constructed via permuting rows and/or columns in the species/site abundance (or presence) matrix (e.g. Mantel tests and related procedures), models fit to sparse re-samples of the full set of pairwise comparisons (e.g. https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1461-0248.2012.01815.x) and models with uncertainty quantified via the bayesian bootstrap over sites (e.g. https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.12710). Above I say that we “don’t have a good generative model that we can write down to DIRECTLY model dissimilarity data”, because ideally it could be possible to fit a generative model to the underlying compositional data, and recover the pairwise dissimilarities and their expected relationships to differences in covariate values post hoc as generated quantities.

Third, I notice that in your example data you happen to have reproduced two rows that are identical except for the order of the sites. If these rows are just identical by happenstance then there might not be any problem, but if they are identical by construction (i.e. due to nonindependence) then this is almost certainly a problem and you’ll want to remove one or the other from the model.

I might be misunderstanding what you mean by 4200 randomly drawn observations. For 43 sites the possible pairwise combinations are 903 link. Aren’t you overdrawing?

Also have a look at Distance matrix regression. I believe your issue is discussed and solved there.

Thanks for all the rapid responses.

@Bob_Carpenter I tried using beta regression (by transforming the response to a 0-1 scale), and it seems to fit much better, while the results do not change substantially.

@jsocolar to clarify the response and predictors between site variables. I.e. the

I acknowledge that spatial autocorrelation can be an issue in such a dataset. However, the experimental setup was specifically designed to avoid that. However, I will try out the options you mentioned in your reply!

To your third point, no, this was not a mistake but intentional by construction. I wanted to ensure that there wouldn’t be any directionality issues when transforming a distance matrix into a data frame. Here’s an example:

> dist(mtcars$qsec[1:5])
     1    2    3    4
2 0.56               
3 2.15 1.59          
4 2.98 2.42 0.83     
5 0.56 0.00 1.59 2.42

If I were to convert this distance matrix into a data frame with identifiers for rows and columns (e.g., row_id and col_id), the structure would become asymmetric. For example:

  • The first element (corresponding to 1) would never appear as a row_id because it’s only in the first row of the matrix.
  • The last element (corresponding to 5) would never appear as a col_id because it’s only in the last row of the matrix.
  • Other elements, like 2 and 3, may appear as both row_id and col_id, but only in specific contexts.

This imbalance arises because the distance matrix inherently stores only the lower (or upper) triangle, which means that certain identifiers won’t be represented symmetrically in the data frame. To ensure symmetry and avoid directionality issues, I constructed the transformation in this way deliberately.

As @amynang pointed out, I am oversampling; I should just use both the upper and the lower triangle instead if my assumptions that the directionality affects the random effect are correct.

Do you think I worry too much about this?

Additionally, I tried it out with some dummy data, and it does not seem to make a difference
→ duplicate cases are handled
→ using one half does not seem to be an issue.

Results dummy data

> summary(fit_lower)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: response ~ predictor + (predictor | mm(row_id, col_id)) 
   Data: dist_lower (Number of observations: 4950) 
  Draws: 4 chains, each with iter = 1000; warmup = 500; thin = 1;
         total post-warmup draws = 2000

Multilevel Hyperparameters:
~mmrow_idcol_id (Number of levels: 100) 
                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)                0.03      0.02     0.00     0.07 1.00     1096     1199
sd(predictor)                0.03      0.02     0.00     0.06 1.00      494      985
cor(Intercept,predictor)    -0.11      0.57    -0.96     0.94 1.00      619      822

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -0.00      0.01    -0.03     0.02 1.00     2742     1257
predictor     2.00      0.01     1.98     2.02 1.00     2568     1560

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.50      0.00     0.49     0.51 1.01     3909     1267

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

> summary(fit_upper)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: response ~ predictor + (predictor | mm(row_id, col_id)) 
   Data: dist_upper (Number of observations: 4950) 
  Draws: 4 chains, each with iter = 1000; warmup = 500; thin = 1;
         total post-warmup draws = 2000

Multilevel Hyperparameters:
~mmrow_idcol_id (Number of levels: 100) 
                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)                0.04      0.02     0.00     0.09 1.00      458      607
sd(predictor)                0.02      0.01     0.00     0.05 1.00      565      864
cor(Intercept,predictor)    -0.28      0.57    -0.98     0.89 1.00      990     1088

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -0.01      0.01    -0.04     0.01 1.00     3322     1480
predictor     2.00      0.01     1.98     2.02 1.00     3316     1346

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.50      0.01     0.49     0.51 1.00     3108     1344

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Warning message:
There were 3 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 

> summary(fit_both)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: response ~ predictor + (predictor | mm(row_id, col_id)) 
   Data: dist_both (Number of observations: 9900) 
  Draws: 4 chains, each with iter = 1000; warmup = 500; thin = 1;
         total post-warmup draws = 2000

Multilevel Hyperparameters:
~mmrow_idcol_id (Number of levels: 100) 
                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)                0.01      0.01     0.00     0.04 1.01      740      821
sd(predictor)                0.01      0.01     0.00     0.03 1.01      557      805
cor(Intercept,predictor)    -0.19      0.58    -0.98     0.91 1.01      694      826

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.00      0.01    -0.01     0.02 1.00     3595     1175
predictor     1.99      0.01     1.98     2.01 1.01     2766     1286

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.50      0.00     0.49     0.50 1.00     3083     1411

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Warning message:
There were 3 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 

Code


# Load libraries
library(brms)
library(dplyr)

# Set seed for reproducibility
set.seed(123)

# Simulate data: Distance matrix for 10 sites
n_sites <- 100
dist_matrix <- as.matrix(dist(rnorm(n_sites)))  # Random distances
rownames(dist_matrix) <- colnames(dist_matrix) <- 1:n_sites

### Extract lower triangle
lower <- which(lower.tri(dist_matrix), arr.ind = TRUE)
dist_lower <- data.frame(
  row_id = lower[, 1],
  col_id = lower[, 2],
  distance = dist_matrix[lower]
)

# Simulate response and predictor for lower triangle
dist_lower <- dist_lower %>%
  mutate(
    predictor = distance + rnorm(n(), mean = 0, sd = 0.2),  # Predictor correlated with distance
    response = 2 * predictor + rnorm(n(), mean = 0, sd = 0.5)  # Response correlated with predictor
  )

### Extract upper triangle
upper <- which(upper.tri(dist_matrix), arr.ind = TRUE)
dist_upper <- data.frame(
  row_id = upper[, 1],
  col_id = upper[, 2],
  distance = dist_matrix[upper]
)

# Simulate response and predictor for upper triangle
dist_upper <- dist_upper %>%
  mutate(
    predictor = distance + rnorm(n(), mean = 0, sd = 0.2),
    response = 2 * predictor + rnorm(n(), mean = 0, sd = 0.5)
  )

### Combine both triangles
dist_both <- rbind(
  dist_lower,
  dist_upper
) %>%
  mutate(
    predictor = distance + rnorm(n(), mean = 0, sd = 0.2),
    response = 2 * predictor + rnorm(n(), mean = 0, sd = 0.5)
  )

### Inspect the data
print(dist_lower)
print(dist_upper)
print(dist_both)

table(dist_lower$row_id)
table(dist_upper$row_id)
table(dist_both$row_id)

### Fit models for each case

# 1. Lower triangle
fit_lower <- brm(
  response ~ predictor + (predictor | mm(row_id, col_id)),
  data = dist_lower,
  family = gaussian(),
  cores = 4,
  iter = 1000
)

# 2. Upper triangle
fit_upper <- brm(
  response ~ predictor + (predictor | mm(row_id, col_id)),
  data = dist_upper,
  family = gaussian(),
  cores = 4,
  iter = 1000
)

# 3. Both triangles
fit_both <- brm(
  response ~ predictor + (predictor | mm(row_id, col_id)),
  data = dist_both,
  family = gaussian(),
  cores = 4,
  iter = 1000
)

### Summarize and plot results for each case
summary(fit_lower)
pp_check(fit_lower)
loo(fit_lower)

summary(fit_upper)
pp_check(fit_upper)
loo(fit_upper)

summary(fit_both)
pp_check(fit_both)
loo(fit_both)

The way you generate your dummy data, the information on the upper and lower triangle is not identical. If it was, fitting the same model with the same seed should produce identical output. Compare ranef(fit_lower) with renef(fit_upper) with your real data. The model does not take into account if a site comes first or second in a multi-membership term. There is no directionality to worry about.

1 Like

All of the issues on this thread have to do with with the non-independence of the data input rows.

  1. You cannot simply duplicate rows, by resampling or otherwise, any more than you can duplicate or upsample the data in an ordinary linear regression and expect to still get well calibrated inference. This is a big no-no. Fortunately, as mentioned above, there is no directionality in your model to worry about. Just use one row per entry in the lower triangle of the matrix.
  2. Even without duplicating rows dissimilarity matrices have a complicated dependency structure. Unfortunately, most solutions mentioned on this thread and in other linked threads, including multi-membership models, do not adequately account for the dependency structure. In the event that the matrix is a distance matrix, the key structure is that the response is guaranteed to satisfy the triangle inequality. In the even that the matrix is a dissimilarity matrix, the structure is similar but much more complicated, depending on the dissimilarity metric. If it’s a point of contention, I am happy to expound on why multi-membership models are seriously inadequate to capture the dependency structures inherent when the response is a non-sparse distance/dissimilarity matrix.
  3. When the response data in a regression have a complicated dependency structure, it can be important to capture this, and failing to do so can lead to bad inference. @bschrobru , I think you (rightly) care about this issue since you mention accounting for the non-independence of the pairwise observations in the second sentence of your original post.
  4. However, it is not clear (to me at least) what the response data represent, and whether they have a difficult dependency structure or not, and if so what the form of this dependency takes and whether or not it would be adequately modeled by a multi-membership term. Note that it does not matter if the covariate data have a complicated dependency structure–what matters is the response. To say anything further, I would want to understand how the response data is computed for a given pair. You mention that it has to do with a coefficient of variation, but AFAIK the coefficient of variation meaningful for univariate data with more than 2 entries, and so I don’t understand what this means in a pairwise context.
2 Likes

I’d certainly be grateful if you could say more about this.

I am also a bit out of touch with the literature but to me the multimembership approach seems similar to the “BetaBayes” approach described here.