Incorporating distance matrices into Gaussian Process in brms

Hello all!

I plan to run multilevel regressions on a global dataset, with random intercepts and slopes grouped by country for my main predictor x, and a Gaussian Process (also grouped by country) to account for both geographic and cultural distance between countries. In brms, this model would look like this:

model <- brm(y ~ 1 + x + gp(geoDist, culturalDist, gr = TRUE, iso = FALSE) + (1 + x | country),
             data = d, family = gaussian)

It seems that when brms takes the argument gp(x), x is required to be a vector column in the dataset. However, instead of having columns in my data called geoDist and culturalDist, I have two NxN distance matrices, much like the spatial autocorrelation example in Richard McElreath’s section on Gaussian Processes in Statistical Rethinking.

I don’t really understand raw Stan, so I was wondering if there would be a way to incorporate these distance matrices into the Gaussian Process using brms without having to modify the Stan code itself. Do I need to somehow reduce my distance matrices to single vectors that brms can understand? Or is there another trick I’m missing?

Thanks so much in advance!

  • Operating System: Windows 10
  • brms Version: 2.13.5
1 Like

Scott, are you saying that doing it like this,

and in particular, from the paragraph starting with,

Before we practice fitting a Gaussian process with …

doesn’t work?

Thanks for the quick reply!

This approach certainly works for spatial autocorrelation, since we can use the nice coordinate system provided by longitude and latitude values. But for other measures of distance (such as “cultural” distance, etc.), there isn’t an obvious analogous coordinate system that I can see being used in the same way. Instead, I have an NxN distance matrix that has been calculated a priori. Can these distances be translated from the distance matrix into brms?

hmm, so the “distance” is more of a conceptual latent nature… I would guess that you would be able to handle it in the same way, but I’ve never done it myself… Perhaps @Solomon could pitch in here before we ask Paul.

This thread might be of interest. @anon75146577 and I never really finished that conversation. I have a nagging feeling that one can indeed get a positive semidefinite matrix under very mild regularity conditions.

1 Like

The fcor() function might be appropriate. See here for an example.

You mean fcor()?

1 Like

Gah! Yes. I’ve edited my original comment to reflect that

Thanks Solomon. I will have a play around with fcor(). But doesn’t this function take a covariance matrix, not a distance matrix? What if I wanted to feed in raw distances?

In the past, I’ve just converted the distance matrix to a covariance matrix manually beforehand. But I have to make some assumptions about covariance parameters before doing this. I like Gaussian Processes because they estimate those covariance parameters directly.

There’s a pretty long history on this starting with Besag in 74. The best version is essentially Lindgren, Rue and Lindström in 2011. But there is a lot in between. As a rule arbitrary weightings either don’t work (Besag suggested the SAR construction to get around this) or they don’t end up representing the type of spatial decay people want. Just use a thin plate spline or a GP if distance is important.

2 Likes

I had asked a similar question here: Custom distance metric with gaussian process

Apparently it should work, but you have to write the model in Stan.

1 Like

Thanks Matias, I thought that this was the case, but just wanted to check. I’m not well-versed in raw Stan, so I’ve resorted to feeding pre-computed covariance matrices to brms via the gr() argument.

And thanks everyone in this thread for a really useful discussion!