iCAR model with missing y data

Hi all,

I am new to Stan and came across this example to implement the iCAR prior for spatially structured residuals in a spatial model for areal data.

However, I was wondering how to extend this model when there are missing y data (e.g. because of censoring).

Thanks!

With missingness/censoring, you gotta start with a description of how things are missing/censored, cause that’ll determine what you do next.

Like if things are missing totally at random, you might not need to change the model any, or if they are missing with some sort of structure you’ll need to do a bunch.

There’s chapters in the manual on missing data and on censoring.

If there’s any special missingness/censoring stuff for ICAR models, I assume it came about for computational efficiency reasons, not necessarily anything mathematical. What I’m trying to say there is it’s fine to read those user guide chapters on missing data/censoring – the techniques should be generic.

That’s the extent of what I know here! Hope it’s useful

Also tagging @mitzimorris and @anon75146577 to see if they know something about this.

I’d need some more information. If some of the ys are just not oberserved, then nothing changes. If there is an explicit censoring mechanism this needs to be modelled, but if the points are independently censored then you just use the censored likelihood in place of the Poisson.

2 Likes

Thank you all and sorry for the late reply!

For the case that I am looking at the points are just not observed (although not missing at random).
The y data represent the number of credit card transactions for a given area unit, but for some units the data is missing because there are no shops. The idea is to estimate the expected number of transactions in these units where currently there are no shops based on a set of socio-economic covariates and the taking advantage of the residual spatial dependence.

Assuming that some of the ys are just not observed, how would you include the sampling for the structurally spatial residual? Is something like this correct for a CAR model structure?

  data {
  int<lower = 1> n;
  int<lower = 1> n_nmis;
  
  int<lower = 1> p;
  
  int<lower = 0> y_nmis[n_nmis];

  matrix[n, p] X;
  matrix[n_nmis, p] X_nmis;
  
  vector[n] log_offset;
  vector[n_nmis] log_offset_nmis;
  
  matrix<lower = 0, upper = 1>[n, n] W;
  matrix<lower = 0, upper = 1>[n_nmis, n_nmis] W_nmis;

}
transformed data{
  vector[n] zeros;
  vector[n_nmis] zeros_nmis;
  matrix<lower = 0>[n, n] D;
  matrix<lower = 0>[n_nmis, n_nmis] D_nmis;
  
  {
    vector[n] W_rowsums;
    for (i in 1:n) {
      W_rowsums[i] = sum(W[i, ]);
    }
    D = diag_matrix(W_rowsums);
  }
  {
    vector[n_nmis] W_nmis_rowsums;
    for (i in 1:n_nmis) {
      W_nmis_rowsums[i] = sum(W_nmis[i, ]);
    }
    D_nmis = diag_matrix(W_nmis_rowsums);
  }

  zeros = rep_vector(0, n);
  zeros_nmis = rep_vector(0, n_nmis);

}
parameters {
  vector[p] beta;
  vector[n] phi;
  vector[n_nmis] phi_nmis;
  real<lower = 0> tau;
  real<lower = 0, upper = 1> alpha;
}
model {
  phi_nmis ~ multi_normal_prec(zeros_nmis, tau * (D_nmis - alpha * W_nmis));
  beta ~ normal(0, 1);
  tau ~ gamma(2, 2);
  y_nmis ~ poisson_log(X_nmis * beta + phi_nmis + log_offset_nmis);
  phi ~ multi_normal_prec(zeros, tau * (D - alpha * W));
}
generated quantities {
    int<lower=0> y_pred[n];
    y_pred = poisson_log_rng(X * beta + phi + log_offset);
}

where

  • n: number of obs.
  • n_nmis : number of non-missing obs.
  • p : number of fixed effects
  • _nmis suffix: data points corresponding to non-missing y data only

Thank you again!