Hi, I am working on building a spatial model using an ICAR component ala this case study. However, all the tutorials on this topic I could find use data with only one observation per areal unit. I’m looking to build a hierarchical model where I have multiple observations per areal unit, while incorporating the ICAR priors (or maybe BYM2).
I am mostly confused on how to build the adjacency matrix, and how to properly index the data for such a model.
Here is my first attempt:
data {
int<lower=0> N;
int<lower=1> K; // K counties
array[N] int<lower=1, upper=K> county;
int<lower=0> N_edges;
array[N_edges] int<lower=1, upper=N> node1;
array[N_edges] int<lower=1, upper=N> node2;
vector[N] y;
}
parameters {
vector[N] phi;
array[K] real alpha_county;
real<lower=0> sigma;
}
model {
target += -0.5 * dot_self(phi[node1] - phi[node2]);
sum(phi) ~ normal(0, 0.01 * N);
to_vector(alpha_county) ~ normal(0, 5);
sigma ~ exponential(1);
for (n in 1:N)
y[n] ~ normal(alpha_county[county[n]], sigma);
}
Using simulated data from the geostan package:
library(geostan)
library(dplyr)
library(rstan)
data("georgia")
georgia_long = georgia %>%
select(GEOID, income, college) %>%
# expand the data to include multiple obs per county
tidyr::crossing(x = 1:10) %>%
mutate(x = rnorm(n(), mean = college),
.by = GEOID) %>%
# not all counties will have the same no. of observations
slice_sample(prop = .75) %>%
sf::st_as_sf()
C <- shape2mat(georgia_long, "B")
nodes <- prep_icar_data(C)
icar_stan = stan_model("icar.stan")
icar_fit = sampling(
icar_stan,
data = list(
N = nrow(georgia_long),
K = length(unique(georgia_long$GEOID)),
county = as.numeric(as.factor(georgia_long$GEOID)),
N_edges = nodes$n_edges,
node1 = nodes$node1,
node2 = nodes$node2,
y = georgia_long$income
),
chains = 4,
cores = 4,
iter = 1000
)
Should the adjacency matrix be constructed using the “long” data like I have done it here? Or should I use a county-level adjacency matrix and index the nodes by county?
Thanks!