Pairwise data in brms

Hi all,

I’m trying to model pairwise co-occurrence data, and am quite interested in the modelling framework outlined here: https://ecoevorxiv.org/sn5jr/

The simple version of the model, as outlined in the paper, looks as follows, for pairs i and j:

S_{ij} \sim \textrm{Normal}(\mu, \sigma) \\ \mu = \alpha + \alpha_{s[i]} + \alpha_{s[j]} + \ldots \\ \alpha_s \sim \textrm{Normal}(0, \sigma_s) \\ \sigma, \sigma_s \sim \textrm{Exponential}(1)

Currently the data I have has two columns, with the combination of the two specifying the pair of species involved in the interaction. Is it possible to specify the above model in brms, where parameters are shared across columns with the same levels?

2 Likes

I think you could do such a model quite easily with a multi-membership grouping term (i.e. you specify each term as being related to two species), see also the discussion of pupils being in multiple schools here.

E.g. something like bf(s ~ 1 + (1|mm(speciesi, speciesj))) + gaussian() as your formula (assuming outcome is in the variable s and the two groups of interest in speciesi and speciesj and the usual way of setting priors in brms. With that, the order of the species should not matter (i.e. it should result in the same thing, whether you swap speciesi and speciesj around).

1 Like

Hi, thanks for the suggestion! I’ve not heard of multi-membership models before, but from a quick googling I think this defines the following model:

\mu = \alpha + \frac{1}{2}\alpha_{s[1]} + \frac{1}{2}\alpha_{s[2]}

I’m no longer sure the model in my OP, or this one, is quite right. I’m not sure the additive approach can capture adequately the pairwise nature of these data points. Instead, given that I have multiple measurements for most species pairs, I think fitting a (hierarchical) \alpha_{ij} for each species combination is a better approach here, though it means that the contributions of individual species aren’t identified (but I’m not so sure they’re identifiable).

Yes, I believe you have it right what the model specification would do. the factor of 1/2 is of course just a matter of scaling the parameter of the exponential prior for \sigma_s to make the two models exactly identical.

Whether in the context of your specific modeling problem one model formulation or another is more meaningful, is hard to say without being familiar with the area. However, I’d note that with only having a random effect on the combinations, you treat the situations of having data on all the 120 combinations of 5 different species as giving you as much independent information as having 120 combinations of completely separate species (i.e. 240 in total, 120 as i, 120 as j). That does seem not quite right to me. You could of course include both a multi-membership terms and the interaction terms in the model