Multi-membership random effects model for distance matrices

kholsinger · June 9, 2025, 5:49pm

Thanks for the comments. I should have been more explicit when I said that I’m putting a CAR prior on the random effects. For each element of the genetic distance matrix I have 8 covariates plus geographic distance. I’m putting a CAR prior on the random effects where the scale of the prior depends on the geographic distances between points. The way I’m thinking about it may be wrong, but I’m thinking about it as the geographic distance in the mean regression picking up the expected genetic distance as a function of geographic distance and the random effects (with the CAR prior) picking up association among residuals because of geographical proximity. Does that make sense?

By the way, if I ever get around to publishing analyses based on this conversation, I will include you in the acknowledgments while making it clear that you don’t necessarily endorse the approach.

jsocolar · June 9, 2025, 6:00pm

How are you defining the adjacency matrix for the CAR piece?

kholsinger · June 10, 2025, 3:25pm

This is what the model looks like:

y_i \sim \mbox{Beta}(\mu_i, \phi) \\ \mu_i = \beta_0 + \sum_{n=1}^N \beta_n x_{ni} + \epsilon_i \\ (\epsilon_1,\dots,\epsilon_I) \propto \mbox{exp}(-\frac{\tau}{2}\sum_j\sum_{k \le j}w_{jk}(\epsilon_j - \epsilon_k)^2) \\ w_{jk} = c_1 + c_2\mbox{exp}(-c_3 d_{jk}) \quad ,

where N is the number of covariates, I is the number of pairwise distances, and d_{jk} is the geographic distance between sites j and k. c_1, $c_2, and $c_3 are chosen for computational convenience and numerical stability (see JASA 104:142-154; 2009, specifically p. 147). The y_i are genetic distances between pairs of loci and the x_{ni} are covariate distances with i = \frac{(j-1)j}{2}+k. x_{1i} is the geographic distance between the pair of sites indexed by i. These are point geographical data, not areal. Note: x_{1i} = d_{jk}, where i = \frac{(j-1)j}{2}+k.

I haven’t tried combining the CAR prior with the squared difference formulation of the random effect, \epsilon_i, but that should work too (at least in the sense that it will compute).

Thanks again for taking the time to respond. I really appreciate the feedback.

jsocolar · June 11, 2025, 9:59am

Got it; thanks.

You mention wanting to capture spatial pattern in the residuals, but the residuals don’t have unique locations in space–they are properties of pairs of sites. The spatial structure of the residuals exists in a four-dimensional space, defined by the positions of both sites, subject to a symmetry constraint due to the fact that the ordering of the pair does not matter. That is, spatial structure in the intercepts \alpha is not necessarily a good model for spatial structure in the residuals. Allowing the residuals to be spatially structured in 4D space could potentially be a good way to control for some of the non-independence, assuming that the structure of the non-independence is fundamentally spatial. I haven’t thought this through carefully though.

I still like the idea of differencing the intercepts. Two-dimensional spatial structure in the additive intercepts implies that there must be pairs of sites that, by virtue of being close together, are unusually dissimilar (these are site-pairs in spatial neighborhoods that display positive values for the intercept, which then gets added twice to the linear predictor for the pairwise dissimilarity between site-pairs in the neighborhood).

kholsinger · June 12, 2025, 6:04pm

Thanks again for the comments. I need to think about this a bit more. I just realized that the notation in the model as I wrote it isn’t correct. I don’t think the correct version addresses your point, but I think it does get part of the way there. Here’s another attempt to get the notation right. i and j index different locations. Notice that there are two random effects associated with each pair of i and j.

y_{ij}∼Beta(\mbox{logit}^{-1}(\mu_{ij}),\phi) \\ \mu_{ij} = \beta_0 + \sum_{n=1}^N\beta_nx_{n,ij}+ \epsilon_i + \epsilon_j \\ (\epsilon_i,\dots,\epsilon_I) \propto \mbox{exp}(-\frac{\tau}{2}\sum_j\sum_{k \le j}w_{jk}(\epsilon_j - \epsilon_k)^2) \\ w_{jk} = c_1 + c_2\mbox{exp}(-c_3 x_{1,jk})

This is (I think) the multimembership model of brms or BetaBayes with a CAR prior on geographic distances. I haven’t tried it yet, but it should be possible to use the squared differences in random effects simply by using this definitions of \mu_{ij}:

\mu_{ij} = \beta_0 + \sum_{n=1}^N\beta_nx_{n,ij}+ (\epsilon_i - \epsilon_j)^2 \\

I have simulations planned to assess how well the various approaches work, and I’ll report back when I have some results.

Topic		Replies	Views
Incorporating a species occurence matrix as predictor in brms brms	13	1269	June 24, 2021
Is beta regression appropriate for this dataset? Modeling specification , brms	8	1508	August 17, 2023
Distance matrix regression brms	9	1601	July 21, 2018
Joint Species Distribution Model Performance Modeling ecology	3	1175	December 15, 2020
BYM2 with unstructured country random effect Modeling	39	2764	February 11, 2022

Multi-membership random effects model for distance matrices

Related topics