# Incorporating distance matrices into Gaussian Process in brms

Hello all!

I plan to run multilevel regressions on a global dataset, with random intercepts and slopes grouped by country for my main predictor `x`, and a Gaussian Process (also grouped by country) to account for both geographic and cultural distance between countries. In brms, this model would look like this:

``````model <- brm(y ~ 1 + x + gp(geoDist, culturalDist, gr = TRUE, iso = FALSE) + (1 + x | country),
data = d, family = gaussian)
``````

It seems that when brms takes the argument `gp(x)`, `x` is required to be a vector column in the dataset. However, instead of having columns in my data called `geoDist` and `culturalDist`, I have two NxN distance matrices, much like the spatial autocorrelation example in Richard McElreathās section on Gaussian Processes in Statistical Rethinking.

I donāt really understand raw Stan, so I was wondering if there would be a way to incorporate these distance matrices into the Gaussian Process using brms without having to modify the Stan code itself. Do I need to somehow reduce my distance matrices to single vectors that brms can understand? Or is there another trick Iām missing?

• Operating System: Windows 10
• brms Version: 2.13.5
1 Like

Scott, are you saying that doing it like this,

and in particular, from the paragraph starting with,

Before we practice fitting a Gaussian process with ā¦

doesnāt work?

This approach certainly works for spatial autocorrelation, since we can use the nice coordinate system provided by longitude and latitude values. But for other measures of distance (such as āculturalā distance, etc.), there isnāt an obvious analogous coordinate system that I can see being used in the same way. Instead, I have an NxN distance matrix that has been calculated a priori. Can these distances be translated from the distance matrix into brms?

hmm, so the ādistanceā is more of a conceptual latent natureā¦ I would guess that you would be able to handle it in the same way, but Iāve never done it myselfā¦ Perhaps @Solomon could pitch in here before we ask Paul.

This thread might be of interest. @anon75146577 and I never really finished that conversation. I have a nagging feeling that one can indeed get a positive semidefinite matrix under very mild regularity conditions.

1 Like

The `fcor()` function might be appropriate. See here for an example.

You mean `fcor()`?

1 Like

Gah! Yes. Iāve edited my original comment to reflect that

Thanks Solomon. I will have a play around with `fcor()`. But doesnāt this function take a covariance matrix, not a distance matrix? What if I wanted to feed in raw distances?

In the past, Iāve just converted the distance matrix to a covariance matrix manually beforehand. But I have to make some assumptions about covariance parameters before doing this. I like Gaussian Processes because they estimate those covariance parameters directly.

Thereās a pretty long history on this starting with Besag in 74. The best version is essentially Lindgren, Rue and LindstrĆ¶m in 2011. But there is a lot in between. As a rule arbitrary weightings either donāt work (Besag suggested the SAR construction to get around this) or they donāt end up representing the type of spatial decay people want. Just use a thin plate spline or a GP if distance is important.

2 Likes

I had asked a similar question here: Custom distance metric with gaussian process

Apparently it should work, but you have to write the model in Stan.

1 Like

Thanks Matias, I thought that this was the case, but just wanted to check. Iām not well-versed in raw Stan, so Iāve resorted to feeding pre-computed covariance matrices to brms via the `gr()` argument.

And thanks everyone in this thread for a really useful discussion!