ICAR Spatial Model with Hierarchical Data

Hi, I am working on building a spatial model using an ICAR component ala this case study. However, all the tutorials on this topic I could find use data with only one observation per areal unit. I’m looking to build a hierarchical model where I have multiple observations per areal unit, while incorporating the ICAR priors (or maybe BYM2).

I am mostly confused on how to build the adjacency matrix, and how to properly index the data for such a model.

Here is my first attempt:

data {
  int<lower=0> N;
  int<lower=1> K; // K counties
  array[N] int<lower=1, upper=K> county;
  int<lower=0> N_edges;
  array[N_edges] int<lower=1, upper=N> node1; 
  array[N_edges] int<lower=1, upper=N> node2; 
  vector[N] y;
parameters {
  vector[N] phi;
  array[K] real alpha_county;
  real<lower=0> sigma;
model {
  target += -0.5 * dot_self(phi[node1] - phi[node2]);
  sum(phi) ~ normal(0, 0.01 * N);

  to_vector(alpha_county) ~ normal(0, 5);
  sigma ~ exponential(1);
  for (n in 1:N)
    y[n] ~ normal(alpha_county[county[n]], sigma);

Using simulated data from the geostan package:



georgia_long = georgia %>%
  select(GEOID, income, college) %>% 
  # expand the data to include multiple obs per county
  tidyr::crossing(x = 1:10) %>% 
  mutate(x = rnorm(n(), mean = college),
         .by = GEOID) %>% 
  # not all counties will have the same no. of observations
  slice_sample(prop = .75) %>% 

C <- shape2mat(georgia_long, "B")
nodes <- prep_icar_data(C)

icar_stan = stan_model("icar.stan")
icar_fit = sampling(
  data = list(
    N = nrow(georgia_long),
    K = length(unique(georgia_long$GEOID)),
    county = as.numeric(as.factor(georgia_long$GEOID)),
    N_edges = nodes$n_edges,
    node1 = nodes$node1,
    node2 = nodes$node2,
    y = georgia_long$income
  chains = 4,
  cores = 4,
  iter = 1000

Should the adjacency matrix be constructed using the “long” data like I have done it here? Or should I use a county-level adjacency matrix and index the nodes by county?


I don’t have time to carefully engage with your Stan code, but in case it’s clarifying, you can think about your model as follows.

Step 1: Build an ICAR model for the county averages as if you have only one observation per county. This will look exactly like the case studies with one observation per county, except that instead of declaring the outcome as data, you will declare the outcome as a parameter.

Step 2: Build a data model that gives the likelihood of your observed data conditional on the county average from step 1.


Thanks, that definitely helps clarify things!

I am still not comfortable enough with Stan syntax to know how to translate the ICAR prior target += -0.5 * dot_self(phi[node1] - phi[node2]); into something that is operating at the county-level as opposed to observation-level.

1 Like

Forget about the observation level entirely for a moment. Make up some data, if you like, that have a single observation per county, and follow the case studies to get a model for these. Suppose you call those data mu. Now, remove your declaration of mu as data, and instead declare mu as a parameter. Now you’re done with the ICAR part, and you can put target += -0.5 * dot_self(phi[node1] - phi[node2]); out of mind. All you need to do now is write down a likelihood for the observed data in each county conditional on mu.

Here’s some example code you can use. Its not using the ICAR but I think you might get the idea. I haven’t tried to run this code, its just adapted from something else I was working on. Other Stan code for ‘varying intercepts’/‘random effects’ might help too, you’re just adding autocorrelation to them.

Or should I use a county-level adjacency matrix and index the nodes by county?

Yeah, the trick is to create an index that connects each individual to a geographic area.

So my code is kind of like this:

# create cross-walk from geoid to index position
geo_index = data.frame(GEOID =  geo_df$GEOID,  geo_index = 1:nrow(geo_df))

# join to your individual-level observations 
dat <- merge(dat, geo_index, by = 'GEOID')

# create your data list for stan, respecting order
A <- shape2mat(geo_df)
dl <- prep_car_data(A)
dl$y <- dat$y
dl$geo_index = dat$geo_index
dl$N = nrow(dat)
dl$N_geo = nrow(geo_df)

You use the index like this in Stan:

  transformed parameters {
  vector[N] mu;
  for (i in 1:N) mu[i] = phi[geo_index[i]] + alpha;
  mu += X * beta;

Here’s the example:

// hiererchical model: individuals nested in geographies

functions {
  // spatial model
   //  https://connordonegan.github.io/geostan/articles/custom-spatial-models.html
  #include functions.stan

data {
  int<lower=1> N;
  int<lower=0> K;
  vector[N] y;
  matrix[N, K] X;
  // geo identifier
  array[N] int geo_index;    

 // size of geog
  int N_geo;

  // CAR parts
  int nAx_w;
  int nC;
  vector[nAx_w] Ax_w;
  array[nAx_w] int Ax_v;
  array[N_geo + 1] int Ax_u;
  array[nC] int Cidx;
  vector[N_geo] Delta_inv;
  real log_det_Delta_inv;
  vector[N_geo] lambda;


parameters {
  // observation-level regression
  vector[K] beta;
  real alpha;
  real<lower=0> sigma;  

  // CAR 
  vector[N_geo] phi;  
  real<lower=0> sigma_phi;
  real<lower=1/min(lambda), upper=1/max(lambda)> rho_phi;  

transformed parameters {
  vector[N] mu;
  for (i in 1:N) mu[i] = phi[geo_index[i]] + alpha;
  mu += X * beta;

  // alternative is to drop alpha from the equation for mu and put it in phi:
  // vector[N_geo] mu_phi = rep_vector(alpha, N_geo);
  // you can have covariates at this level also, like: matrix[N_geo, K_geo] X_geo;

model {
  // phi centered on zero:
  vector[N_geo] mu_phi = rep_vector(0, N_geo);

  // likelihood
  target += normal_lpdf(y | mu, sigma);
  // regression priors
  target += normal_lpdf(beta0 | 0, 25);
  target += normal_lpdf(sigma | 0, 5);
  target += normal_lpdf(beta | 0, 5);
   // CAR: phi ~ Normal(mu_phi, Sigma)
  target += wcar_normal_lpdf(phi |
                 mu_phi, sigma_geo, rho_geo, 
                 Ax_w, Ax_v, Ax_u, 
   // CAR model priors [implicit uniform on rho in its support]
  target += student_t_lpdf(sigma_geo | 10, 0, 5);

generated quantities {
  vector[N] log_lik;
    for (i in 1:N_obs) log_lik[i] = normal_lpdf(y[i] | mu[i], sigma);    

Thanks for your help @jsocolar and @cmcd! I think I now have something that works. Following @jsocolar’s advice I attempted to model the BYM2 part (comprised of the spatial component and the unstructured component) on its own, and then condition on those parameters in the observation-level model. Rather than calling those parameters mu, I call them convolved_re to match the case study. But I hope they reflect the intuition you were trying to provide!

data {
  int<lower=1> N; // no. observations
  int<lower=1> K; // no. counties
  array[N] int<lower=1, upper=K> county;
  int<lower=1> N_edges; // no. edges created from county-level adjacency matrix
  array[N_edges] int<lower=1, upper=K> node1;
  array[N_edges] int<lower=1, upper=K> node2; 
  vector[N] x; // observation-level predictor
  vector[N] y; // outcome
  real<lower=0> scaling_factor;
parameters {
  real<lower=0, upper=1> rho;
  vector[K] phi; // spatial component
  vector[K] theta; // unstructured component
  real beta; // fixed effect
  real alpha; // global intercept
  vector[K] county_alpha;
  real<lower=0> county_sigma;
  real<lower=0> sigma; // residual sd
transformed parameters {
  vector[K] convolved_re;
  convolved_re = sqrt(1 - rho) * theta + sqrt(rho / scaling_factor) * phi;
model {
  rho ~ beta(0.5, 0.5);
  target += -0.5 * dot_self(phi[node1] - phi[node2]);
  sum(phi) ~ normal(0, 0.01 * K);
  theta ~ normal(0, 2);
  beta ~ normal(0, 2);
  alpha ~ normal(0, 2);
  sigma ~ normal(0, 1); 

  county_sigma ~ normal(0, 1); // hyperprior for hierarchical term
  for (k in 1:K) {
    county_alpha[k] ~ normal(convolved_re[k], county_sigma);
  y ~ normal(alpha + county_alpha[county] + x * beta, sigma);