Imagine that I have data on what share of the vote the governing party got in elections in 100 countries. I want to model the outcome as a function of each country’s economy. Ordinarily, I would fit something like this:
vote ~ 1 + econ + (1 + econ | country)
But I also have data on the longitude and latitude of each country in my data too. Given that countries close to each other are likely to share common exposures, it makes sense to include their geography in the model too. One way of doing this is to use a Gaussian process:
vote ~ 1 + econ + (1 + econ | country) + gp(lon, lat)
However, my understanding is that GPs are akin to varying intercepts over continuous clusters (in this case, geographic degrees). I want to let the effect of the economy vary over each country. But I’m also conscious that the model specification above accounts for country level differences in the intercept twice: once in the varying intercepts and once in the GP. As such, my thinking is that I should fit a model like this instead:
vote ~ 1 + econ + (0 + econ | country) + gp(lon, lat)
This, I believe, pools information across countries using the Gaussian process, but also lets the effect of the economy vary by country without accounting for intercept differences twice.
Does this make sense? Or have I misunderstood what GPs are doing relative to varying intercepts/slopes?