Bayesian beginner here, with very basic frequentist training from political science.
I worked through McElreath’s wonderful book (and Solomon Kurz’s amazing translation into the terrific brms). I found McElreath’s example using Gaussian Processes (GP) for spatial regression illuminating. But I struggle on a conceptual level and would very much appreciate some clarifications:
how different/better are GP compared to spatial autoregressive models (SAR or SLM - spatial lag models) used in polsci?
most importantly, are GP autoregressive in the sense that the value of Y in one unit are directly influenced by values of Y in its neighbors? It’s important because in polsci we want to study diffusion processes (e.g. How does democracy spread? Does democracy in one country influence democracy in neighbors? etc).
I showed the GP model to people using spatial econometrics and they were skeptical: according to them, spatial econometrics are better at testing more thoroughly for the spatial processes (not only lag-dependence but also error dependence and spatial heterogeneity). Can we control for spatial heterogeneity in GP models? Can GP be used to model outcome spatial influence but also the spatial influence of predictors in neighbors?
Given that I could not find clear answers to these questions in “Modelling Spatial and Spatial-Temporal Data: A Bayesian Approach” by Haining and Li (apart from pages 368-9 but it is not clear to me how GP fit here in the CAR vs. SAR approaches - it seems GP is closer to CAR approach because it’s a random effect models?) or in BDA3 and the Stan Forums, perhaps some clarifications could be useful to the whole community?
I cannot answer your questions as I am no expert in this type of work. But Prof. Tony Smith at Upenn and Jacob Dearmon have written two papers that may be of use to you:
Spatial econometrics is one approach of several for modeling spatial data. I’d recommend looking at resources beyond econ/political science for answers to these questions.
When GPs are used in spatial statistics, they’re referred to as geostatistical models (kriging), to make predictions at new locations (i.e., sites at which you do not have observations). GPs have also been used to infer coefficients that are non-constant–varying over space or varying by some continuous attribute. There are other models in spatial statistics that are doing that (for better or worse)----geographically weighted regression and various spatially varying coefficient models.
I have not yet read this chapter, but Banerjee, Carland, and Gelfand, “Spatial Misalignment” in Hierarchical Modeling and Analysis for Spatial Data (CRC 2015 edition) has a part on using GPs. If you have observations at one place but need to infer some process occurring nearby or at a different scale, you have spatial misalignment and apparently they use GPs for that.
That work by Tony Smith and Jacob Dearmon is the only other time I’ve come across it within spatial econometrics, and I thought what they were doing was pretty innovative. So I think your last question is more of a research topic than one with a given answer (but you may find some ideas in Smith and Dearmon’s papers).
edit: @ArthurGazda on GP and space-time (geostatistical) modeling, see:
Christopher K. Wikle, Andrew Zammit-Mangion, and Noel Cressie, Spatio-Temporal Statistics with R. https://spacetimewithr.org/
The pdf for the book is free and available on the publisher’s website.
If I understand correctly (apologies for mistakes, these references are very technical, especially for an undertrained non-native English speaker), from the point of view of McElreath’s Oceanic tool example the difference between spatial econometrics and gaussian processes is the following:
Spatial econometrics will say that spatial correlation can come from different sources and thus you should use different models (from Congdon 2019).
if spatial correlation happens through the outcome (e.g. the number of tools in the neigboring island influences the number of tools on my island), use a spatial autoregressive model (or spatially lagged model). This is a useful model if you also want to look at spatial feedbacks between observations.
if spatial correlation happens through predictors in other units (the spatial Durbin model - SDM or SLX): in Oceanic tools, diffusion can happen through population exchanges, so the larger the population next door, the higher spatial diffusion.
if spatial correlation happens through errors (we assume there is no diffusion from other units, but rather spatial correlation in the error) we use a spatial error model (SEM).
These models can be a bit tricky because:
we impose strong assumptions on the spatial correlation: it occurs either through Y or X. This can generate bias: we may think we pick up little spatial correlation in Y because most spatial correlation happens through X.
In general this can be problematic because we don’t know where spatial correlation is coming from.
Problem for non-normal outcomes (binary, counts etc) so practicioners recommend switching to Bayesian models (Ward and Gleditsch 2018; 2002; Congdon 2019)
GP is a different approach because it uses varying effects that do not make assumptions about where the spatial correlation is coming from (whether the outcomes, predictors or error) but rather model all sources of spatial correlation together through spatially varying coeficients. This approach has several advantages:
We avoid any assumption of where spatial correlation is coming from
Allow any outcomes whether continuous or binary etc
We use all the information in the data which gives full inference and uncertainty assessment (Gelfand and Schliep 2016)
As cmcd says, we can make predictions for new places because it’s a hierarchical model, and because of that we can share information between units
Does this make sense? If so, the difference boils down to why use hierarchical modeling in spatial regression? My intuition is that GP simply is more flexible (no assumptions on the form of the spatial correlation but also we can use any outcome variable we like) and use all the information in the data.
The natural next question is then: can we have the best of both worlds (having the full inference of GP and decomposing spatial autocorrelation to see whether it is coming from Y, X or e)? Dearmon and Smith 2016 propose to select which predictor should go into the model, so it’s different. What we would need is something like on the cover of BDA3: GP decomposes temporal trends into days of the week, months… For the spatial GP in McElreath, how could we know where spatial correlation is coming from (the outcome, the predictor, or the errors)?
What do people who use Gaussian Processes (Profs. McElreath, Vehtari, Gelman) think?
Many thanks in advance!