I plan to run multilevel regressions on a global dataset, with random intercepts and slopes grouped by country for my main predictor x, and a Gaussian Process (also grouped by country) to account for both geographic and cultural distance between countries. In brms, this model would look like this:
model <- brm(y ~ 1 + x + gp(geoDist, culturalDist, gr = TRUE, iso = FALSE) + (1 + x | country),
data = d, family = gaussian)
It seems that when brms takes the argument gp(x), x is required to be a vector column in the dataset. However, instead of having columns in my data called geoDist and culturalDist, I have two NxN distance matrices, much like the spatial autocorrelation example in Richard McElreath’s section on Gaussian Processes in Statistical Rethinking.
I don’t really understand raw Stan, so I was wondering if there would be a way to incorporate these distance matrices into the Gaussian Process using brms without having to modify the Stan code itself. Do I need to somehow reduce my distance matrices to single vectors that brms can understand? Or is there another trick I’m missing?
This approach certainly works for spatial autocorrelation, since we can use the nice coordinate system provided by longitude and latitude values. But for other measures of distance (such as “cultural” distance, etc.), there isn’t an obvious analogous coordinate system that I can see being used in the same way. Instead, I have an NxN distance matrix that has been calculated a priori. Can these distances be translated from the distance matrix into brms?
hmm, so the “distance” is more of a conceptual latent nature… I would guess that you would be able to handle it in the same way, but I’ve never done it myself… Perhaps @Solomon could pitch in here before we ask Paul.
This thread might be of interest. @anon75146577 and I never really finished that conversation. I have a nagging feeling that one can indeed get a positive semidefinite matrix under very mild regularity conditions.
Thanks Solomon. I will have a play around with fcor(). But doesn’t this function take a covariance matrix, not a distance matrix? What if I wanted to feed in raw distances?
In the past, I’ve just converted the distance matrix to a covariance matrix manually beforehand. But I have to make some assumptions about covariance parameters before doing this. I like Gaussian Processes because they estimate those covariance parameters directly.
There’s a pretty long history on this starting with Besag in 74. The best version is essentially Lindgren, Rue and Lindström in 2011. But there is a lot in between. As a rule arbitrary weightings either don’t work (Besag suggested the SAR construction to get around this) or they don’t end up representing the type of spatial decay people want. Just use a thin plate spline or a GP if distance is important.
Thanks Matias, I thought that this was the case, but just wanted to check. I’m not well-versed in raw Stan, so I’ve resorted to feeding pre-computed covariance matrices to brms via the gr() argument.
And thanks everyone in this thread for a really useful discussion!