Sparse CAR predictions for hold-out data


#1

Hi,
I am trying to fit the number of crimes in an area with a Sparse Conditional Auto-Regressive model (CAR) (https://mc-stan.org/users/documentation/case-studies/mbjoseph-CARStan.html). The problem is fitting the crime y as:

Y_i \sim Poisson(exp( X_i^T \beta + \epsilon_i ))

\epsilon_i \sim N(0, [\tau(D -\alpha W)]^1)

with:

  • \alpha parameter that controls spatial dependence
  • W is a binary spatial contiguity matrix (e.g. if a and b are near, W_{a,b} = 1 and W_{b,a} = 1)
  • \tau is a spatially varying precision parameter
  • X are the covariates

This model accounts for the spatial auto-correlation (correcting the bias of \beta), and for overdispersion (through the variability).
Although the fit goes very for in-sample data (ofc for the random effects!), I don’t really understand how to use stan for new data. How can I modify the stan model to generate new random effects? E.g. random effect for a new area that was not used in the training phase.

data {
  int<lower = 1> n;
  int<lower = 1> p;
  matrix[n, p] X;
  int<lower = 0> y[n];
  matrix<lower = 0, upper = 1>[n, n] W;
}
transformed data{
  vector[n] zeros;
  matrix<lower = 0>[n, n] D;
  {
    vector[n] W_rowsums;
    for (i in 1:n) {
      W_rowsums[i] = sum(W[i, ]);
    }
    D = diag_matrix(W_rowsums);
  }
  zeros = rep_vector(0, n);
}
parameters {
  vector[p] beta;
  vector[n] phi;
  real<lower = 0> tau;
  real<lower = 0, upper = 1> alpha;
}
model {
  phi ~ multi_normal_prec(zeros, tau * (D - alpha * W));
  beta ~ normal(0, 1);
  tau ~ gamma(2, 2);
  y ~ poisson_log(X * beta + phi);
}

generated quantities {
    vector[n] log_lik;
    vector[n] y_pred;
    real mu;
    for (i in 1:n) {
      y_pred[i] = poisson_log_rng(X[i] * beta + phi[i]);
    }
}

In this stan model (the simplest one) y_pred is for the in-sample data.

Thanks


#2

Hi, I only have shallow understanding of CAR, but I think procedure for drawing a single sample for new areas would be:

  1. Let’s assume W_prime is W extended to contain the new locations, similarly phi_prime
  2. Take a single posterior sample
  3. Using tau, alpha from the sample to compute W_prime and draw phi_prime ~ N(0, tau * (D - alpha * W_prime) given phi_prime[1:n] = phi I.e. this is a draw from a multivariete normal with a given covariance matrix, conditional on known values of a subset of the parameters. This should be multivariete normal, but getting the covariance/precision conditional on phi will require some linear algebra (or Googling :-)
  4. Plug in beta and X to compute predictions

Note that you can do that in both generated quantities in Stan or you can do that in R (or whatever is your host language is).


#3

Thanks for your answer. Since I’m a noob with stan, I’m stucked here :D

This should be multivariete normal, but getting the covariance/precision conditional on phi will require some linear algebra (or Googling :-)

Could you help me a bit more? Thaanks


#4

Not much, I am not strong in linear algebra :-) But I believe Wikipedia has exactly what you want:


Though this is for covariance matrices not precision matrices. The simplest approach would be to invert the matrix first, then compute the new covariance matrix then invert back - there probably is some simplification possible, but it would take me hours to get to it :-)