I’m reaching the limits of my statistical understanding here when it comes to model specification in BRMS.

I’m an ecologist researching how different drivers of species distribution (generalised into three categories, Biotic, Abiotic and Space) change across scale (e.g. from 1km up to 10km).

To do this I have run some Joint Species Distribution Models and partitioned the variance in my models to understand the variance contribution of each category across my resolutions. According to ecological theory, I should be expecting to see the variance attributed to Environment increase as scale increases, and Biotic (potential species interactions) decrease as scale increases. A visual snapshot for three of my species (15 species in total) for my data is below.

Ideally I would like to show how scale is associated with a % increase or decrease in each variance explained category across all species.

After some discussion myself and my supervisor think this should be modelled multinomially as below. Is this appropriate?

C[Biotic VE, Environment VE, Spatial VE] ~ Scale + (1|Species)

My values are bounded between 0-1 so could run a beta regression on each variance category individually but I’m also aware that the composition of each variance category isn’t independent from each other.

I initially thought a Dirichlet regression might work well as explained by Andrew Heiss (Guide to understanding the intuition behind the Dirichlet distribution | Andrew Heiss), but am now aware that would require all my categories to sum to 1, which would mean re-scaling everything to fit. I’m somewhat hesitant to do this as total variance explained for each species is also interpreted as an indicator of model fit.

Is there an alternate distribution that would be more suitable for this?

Are you asking about an appropriate model for these proportions post-hoc where the input data is the results from your JSDM? Or are you asking about how to model the influence of scale on these variance components directly within the JSDM framework in one step?

I think it’s a question that you as the expert need to answer whether the quantity of interest is:

the total variance attributed to a given component

the proportion of the total variance attributed to a given component

the proportion of the total explained variance attributed to a given component (where the denominator excludes the residual variance)

After choosing which of these quantities is of primary interest, we can help you to form an appropriate model.

With that said, this procedure relies on the independence of the estimated variance components across scales and on the strong assumption that your point estimates for the variance components adequately characterize those variance components. Proceed with caution.

Finally, not sure if this is what you’re doing, but note that in JSDMs the reduced-rank correlation matrix is interpretable as the fingerprint of biotic interactions only if the fixed effect structure effectively models all of the variation due to the environment. Since this is never actually the case, caution is warranted here as well, lest unmodeled abiotic variation be interpreted as biotic interaction.

I think here I am interested in the total variance attributed to a given component and how that changes across scale. As total variance is explained is somewhat interpretable as measure of model fit which varies across each species, and referring only to proportional variance means we lose this information.

Though I will have to think carefully about your statement below. The data input for each scale is a gradually coarser square over the environment to form a binary community matrix and resultant environmental parameters, so I imagine that each scale isn’t truly independent from one another, though I don’t see a way around this. Each model has been individually fine tuned for highest log-likelihood with elastic net regression, so results should best reflect the most representative predictions for each dataset at that scale. I’m unsure if that would have any impact on the independence between model outputs.

With that said, this procedure relies on the independence of the estimated variance components across scales and on the strong assumption that your point estimates for the variance components adequately characterize those variance components. Proceed with caution

I’m very much aware that these “biotic” interactions are almost certainly not biotic (which is most likely some unmodeled environmental variable), or at least will not be interactions particularly at the scale mosquito’s are likely to interact. Yet I think this is still useful to understand what drivers we may be missing and if these vary across scale. The research is a pre-cursor to investigating species traits which might reveal more realistic biological patterns and drivers at a much finer resolution that “real” interactions might be explainable, but first thought it an interesting question to ask how this might change at larger scales.

Since my previous post I figured the Dirichlet and multinomial logit aren’t the best distributions for my data. So instead ran three separate beta regressions for each of the components. Though I’m not confident this is the correct approach.

It’s actually worse than that - if interactions are confounded by the explanatory environmental variables, then you can end up “explaining away” true biotic interactions by trying to include all the environmental variables. A classic ecological example is Connell’s barnacle study, where interspecific competition among two species of barnacles results in segregation along a depth profile. Including depth in such a model would remove the residual negative correlation among the two species in a JSDM.

This is a great point that cuts both ways and ultimately speaks to the conceptual difficulty of cleanly categorizing the mechanisms limiting species distributions as “abiotic” vs “biotic” in black-and-white terms. When the outcome of competition is abiotically determined, then there’s no single clean answer to whether the pattern is due to biotic or abiotic factors. At the ends of a spectrum we have abiotically determined limits (governed by e.g. the absolute limits of physiological tolerance), and biotically determined limits (governed by e.g. priority effects), but clear examples of either one are precious few, particularly among endotherms.

In Connell’s example, including depth would lead to interpretations that overlook the key role of competition in setting distributional limits with respect to depth. Not including depth would lead to interpretations that overlook the key role of depth in shaping the outcome of competition.