BYM2 spatial model - Making predictions to new area/geography from fit model



Hello all,
I am implementing the Besag York Mollié (BYM) spatial model as specified by Mitzi Morris in the iCAR case-study.
There is plenty of background at that link, but in general it is a Poisson GLM with both an ordinary random effects, as well as a spatial random effect via an iCAR prior. The code for the BYM2 model is included below.

My question is once a model with covariates is fit to an area (ex. voting districts of NYC), how can this model be predicted to a new area (ex. voting districts of Boston given the parameters distributions of the NYC model)? For non-spatial models, I have done this either as generated quantities or with posterior samples, but I am unsure how to do it when there is a spatial weights matrix involved.
If I try to do this in the generated quantities block, I can easily supply N_test, y_test, E_test, and x_test data to represent the predicted geography. However, the convolution prior is a vector if length N; the training data. Since I cannot sample a model with parameters vectors of both length N and N_test, I cannot predict to a new area based on the phi, rho, or theta parameters of the initial model. Perhaps the answer is that I need to re-fit the entire ordinary random and spatial random effects components with the spatial weight matrix of the new area (e.g. Boston) and use the betas from the initial model fit (e.g. NYC)?
Any thoughts or help would be appreciated. Apologies if this is incomplete, unclear, trivial.
Thank you,

data {
  int<lower=0> N;
  int<lower=0> N_edges;
  int<lower=1, upper=N> node1[N_edges];  // node1[i] adjacent to node2[i]
  int<lower=1, upper=N> node2[N_edges];  // and node1[i] < node2[i]

  int<lower=0> y[N];              // count outcomes
  vector<lower=0>[N] E;           // exposure
  int<lower=1> K;                 // num covariates
  matrix[N, K] x;                 // design matrix

  real<lower=0> scaling_factor; // scales the variance of the spatial effects
transformed data {
  vector[N] log_E = log(E);
parameters {
  real beta0;            // intercept
  vector[K] betas;       // covariates

  real<lower=0> sigma;        // overall standard deviation
  real<lower=0, upper=1> rho; // proportion unstructured vs. spatially structured variance

  vector[N] theta;       // heterogeneous effects
  vector[N] phi;         // spatial effects
transformed parameters {
  vector[N] convolved_re;
  // variance of each component should be approximately equal to 1
  convolved_re =  sqrt(1 - rho) * theta + sqrt(rho / scaling_factor) * phi;
model {
  y ~ poisson_log(log_E + beta0 + x * betas + convolved_re * sigma);  // co-variates

  // This is the prior for phi! (up to proportionality)
  target += -0.5 * dot_self(phi[node1] - phi[node2]);

  beta0 ~ normal(0.0, 2.5);
  betas ~ normal(0.0, 2.5);
  theta ~ normal(0.0, 1.0);
  sigma ~ normal(0,5);
  rho ~ beta(0.5, 0.5);
  // soft sum-to-zero constraint on phi)
  sum(phi) ~ normal(0, 0.001 * N);  // equivalent to mean(phi) ~ normal(0,0.001)
generated quantities {
  real logit_rho = log(rho / (1.0 - rho));
  vector[N] eta = log_E + beta0 + x * betas + convolved_re * sigma; // co-variates
  vector[N] mu = exp(eta);


I think I semi-answered a similar question recently: Sparse CAR predictions for hold-out data Does that apply to your situation?


OK, on second thought - your case is likely different, as you transfer to completely new territory, not just new districts in the same territory.

In your case, I would first stop to think what question are you answering. Since you cannot ground your predictions for Boston in any actual data from Boston, than why are you interested in spatial correlations? If you are interested only in marginal distributions for individual districts in Boston, than I would guess the iCAR prior to resolve to just a single random effect.

If your are interested in the global patterns in Boston, you IMHO need to sample the spatial effect from the corresponding multivariete normal.

In both cases the iCAR term is just additional noise to your predictions.

Does that make any sense? (I am not 100% sure this is correct, so please check for yourself)


That is the correct answer to most of my questions! Thank you for your time and thoughts. My interest can be summed up to, “If we can model the count of incidence (y) conditioned on the environmental factors (betas) and spatial structure (rho) in Queens, NYC, can we project that understanding to Brooklyn, NYC where we do not have a count of incidence y, but have measured environmental factors X and will assume the same spatial process?” (I decreased the geographic distance between train and test areas to better mimic the real research)

Thinking that through based on your questions:

  • " If you are interested only in marginal distributions for individual districts…" - Yes, this is what I am interested in, but I am note sure how to resolve to a single random effect. rho is the important parameter because it gives information on balancing endogenous variance (towards theta) and spatial/exogenous variance (towards phi); this is what I am considering the “spatial process”. However, rho is expressed via the convolved_re vector, and that depends on phi which needs known incidence counts y to make the node1 and node2, etc… Trying to set any part of the either the spatial or non-spatial RE requires knowing the y of the new area.

  • “the iCAR term is just additional noise to your predictions” - yes, very true. Via the convolved_re vector, the direction of the noise is determined. However we do not have a convolved_re if we do not have y's for the new area. I suppose I could just fit the new data with the betas which are learned jointly with the spatial structure, but the convolved_re * sigma is significant as we are dealing with low counts (max = 6) and sparse data (88% have a count of zero).

It may very well be that what I am asking for, predicting this model to an area without known counts, is not appropriate with this specification and I appreciate your above response to help me think through that. If you have any more thoughts, I would be grateful to hear them. I will continue to think it through.
Very Best,


Maybe I am missing something (I have very superficial knowledge of iCAR), but does it really? Imagine this process:

  • Take a sample of rho and sigma from your posterior
  • Sample new phi and theta for all new locations from it’s prior (because we have no other info for a new location). Sampling phi might be tricky, but it should actually be multivariete normal, isn’t it?
  • Now you can compute convolved_re

You could probably improve by expressing priors on phi and theta via new hyperparameters and use the posterior of those hyperparameters for sampling phi and theta for new locations…

However, the above makes sense at best in the small world of the model. But the applicability to the real world is at best questionable - whether assuming that the real world effects don’t change between places is reasonable depends on your application and would be very hard to check.

Does that make sense? (once again, please check my reasoning