Question: Gaussian Processes and Varying Intercepts/Slopes

Hi all,

Imagine that I have data on what share of the vote the governing party got in elections in 100 countries. I want to model the outcome as a function of each country’s economy. Ordinarily, I would fit something like this:

vote ~ 1 + econ + (1 + econ | country)

But I also have data on the longitude and latitude of each country in my data too. Given that countries close to each other are likely to share common exposures, it makes sense to include their geography in the model too. One way of doing this is to use a Gaussian process:

vote ~ 1 + econ + (1 + econ | country) + gp(lon, lat)

However, my understanding is that GPs are akin to varying intercepts over continuous clusters (in this case, geographic degrees). I want to let the effect of the economy vary over each country. But I’m also conscious that the model specification above accounts for country level differences in the intercept twice: once in the varying intercepts and once in the GP. As such, my thinking is that I should fit a model like this instead:

vote ~ 1 + econ + (0 + econ | country) + gp(lon, lat)

This, I believe, pools information across countries using the Gaussian process, but also lets the effect of the economy vary by country without accounting for intercept differences twice.

Does this make sense? Or have I misunderstood what GPs are doing relative to varying intercepts/slopes?

In spatial statistics, a GP is used when you have observations at particular sites and you’re interested in predicting values at sites that are as yet unobserved; all this comes after an artful analysis of the semivariogram, etc.

I would suggest that the human geography of countries is not well represented by points in space. It is certainly possible to assign to them the lat-lon of their geographic center, but there is no inhabited space between countries, and there’s no prediction from the GP that would make sense apart from the observed countries. All of your space is carved up by states (mutually exclusive and exhaustive partition). That’s why countries are more comfortably represented with a lattice structure. (It is possible to use distance between centroids to create an inverse-distance weighted connectivity matrix still.) In any case its a lot easier to work with lattice data.

That’s not really what you were asking about…but it sounds to me that your thinking is correct on the redundancy of the varying intercept term.

So if you have multiple observations per country, I think you’re saying you want to have an intercept per country, as in
y_i \sim N(\alpha + \alpha_{country[i]}, \sigma^2)
\alpha_{[country]} \sim N(0, I \tau^2)
plus covariates.

To include some abstract information about human geography you could have
\alpha_{country} \sim N(0, \Sigma)
where \Sigma is some spatial model specification like the CAR model. In that case, the original varying intercept term would be redundant, like you’re saying.

2 Likes

This is really useful.

This is basically right, but I have both intercepts and slopes:

y_{i} \sim N(\alpha_{country[i} + \beta_{country[i]}x_i, \sigma)

So are you saying that I could fit the model you describe with but use a CAR structure on the varying intercepts as so? I.e., in brms code:

vote ~ 1 + econ + (0 + econ | country) + car(M)

Where car(M) reflects the CAR structure over the adjacency matrix M? Or should I use (1 + econ | country) and not (0 + econ | country) in this case?

It depends how exactly brms is implementing the CAR model. I think this:

vote ~ 1 + econ + (0 + econ | country) + car(M)

is right if the CAR model has zero mean. I think you be able to verify that by printing the Stan code.

I’ve found that the CAR model doesn’t sample well when centered on zero (although I use a different implementation than brms).

1 Like