Improving mixing for (sparse) CAR models

I am working on a CAR model, using Exact sparse CAR models in Stan as a starting point. The model used here is the same as in Max Joseph’s, i.e.

{y_i} \sim {\rm{Poisson}}({\rm{exp}}({X_i}\beta + {\phi _i} + \log ({\rm{offse}}{{\rm{t}}_i})))
\phi \sim N(0,{[\tau (D - \alpha W)]^{ - 1}})
\beta \sim Normal(0,1)
\tau \sim Gamma(2,2),

where X is the design matrix, \beta the coefficients (the first being the main intercept), \phi the (autocorrelated) random effects, offset scales the observation according to expectations (in my case area), \tau is the precision of random effects, D is a diagonal matrix with the number of neighbors for each location, and W is the adjacency matrix.

The example presented here uses adjacency and design matrices and offset from a real application I am working on, but the data (y) is simulated to check that I can obtain posteriors that capture the parameters used for simulations. There are 292 random effects and 763 adjacencies. The posteriors appear correct, but the mixing is bad. It varies with parameterization, but, in the example used here, \beta_1 (the intercept) and individual random effects get an effective sample size of ~100 even when using 5000 post-warmup iterations and 4 chains. In the presented toy example, it would be feasible to just crank up iterations further—the computation takes just a few minutes. But in the real application I am working towards, computation time is a limiting factor.

Based on the trace and pairs plots, it appears that the problem is due to a strong negative correlation between \beta_1 and individual \phi_i. Stan usually handles such correlations well, but not in this case.
pairs

Are there any ways to reparameterize for improved efficiency? So far, the only thing I’ve tried is to use non-centered parameterization of \phi, but that seems to increase computation time and reduce effective sample size (at least if \tau is low or moderate).

The stan code below is essentially the same as in Max Joseph’s example in the link above. The data used is located here:
car_test.rdata (15.2 KB)


library(rstan)
load(file="car_test.rdata")
fake_fit <- stan('car_prec_sparse.stan', data = fake_d, 
                 iter = niter, chains = nchains, verbose = FALSE)


functions {
  /**
  * Return the log probability of a proper conditional autoregressive (CAR) prior 
  * with a sparse representation for the adjacency matrix
  *
  * @param phi Vector containing the parameters with a CAR prior
  * @param tau Precision parameter for the CAR prior (real)
  * @param alpha Dependence (usually spatial) parameter for the CAR prior (real)
  * @param W_sparse Sparse representation of adjacency matrix (int array)
  * @param n Length of phi (int)
  * @param W_n Number of adjacent pairs (int)
  * @param D_sparse Number of neighbors for each location (vector)
  * @param lambda Eigenvalues of D^{-1/2}*W*D^{-1/2} (vector)
  *
  * @return Log probability density of CAR prior up to additive constant
  */
  real sparse_car_lpdf(vector phi, real tau, real alpha, 
    int[,] W_sparse, vector D_sparse, vector lambda, int n, int W_n) {
      row_vector[n] phit_D; // phi' * D
      row_vector[n] phit_W; // phi' * W
      vector[n] ldet_terms;
    
      phit_D = (phi .* D_sparse)';
      phit_W = rep_row_vector(0, n);
      for (i in 1:W_n) {
        phit_W[W_sparse[i, 1]] = phit_W[W_sparse[i, 1]] + phi[W_sparse[i, 2]];
        phit_W[W_sparse[i, 2]] = phit_W[W_sparse[i, 2]] + phi[W_sparse[i, 1]];
      }
    
      for (i in 1:n) ldet_terms[i] = log1m(alpha * lambda[i]);
      return 0.5 * (n * log(tau)
                    + sum(ldet_terms)
                    - tau * (phit_D * phi - alpha * (phit_W * phi)));
  }
}
data {
  int<lower = 1> n;
  int<lower = 1> p;
  matrix[n, p] X;
  int<lower = 0> y[n];
  vector[n] log_offset;
  matrix<lower = 0, upper = 1>[n, n] W; // adjacency matrix
  int W_n;                // number of adjacent region pairs
}
transformed data {
  int W_sparse[W_n, 2];   // adjacency pairs
  vector[n] D_sparse;     // diagonal of D (number of neigbors for each site)
  vector[n] lambda;       // eigenvalues of invsqrtD * W * invsqrtD
  
  { // generate sparse representation for W
  int counter;
  counter = 1;
  // loop over upper triangular part of W to identify neighbor pairs
    for (i in 1:(n - 1)) {
      for (j in (i + 1):n) {
        if (W[i, j] == 1) {
          W_sparse[counter, 1] = i;
          W_sparse[counter, 2] = j;
          counter = counter + 1;
        }
      }
    }
  }
  for (i in 1:n) D_sparse[i] = sum(W[i]);
  {
    vector[n] invsqrtD;  
    for (i in 1:n) {
      invsqrtD[i] = 1 / sqrt(D_sparse[i]);
    }
    lambda = eigenvalues_sym(quad_form(W, diag_matrix(invsqrtD)));
  }
}
parameters {
  vector[p] beta;
  vector[n] phi;
  real<lower = 0> tau;
  real<lower = 0, upper = 1> alpha;
}
model {
  phi ~ sparse_car(tau, alpha, W_sparse, D_sparse, lambda, n, W_n);
  beta ~ normal(0, 1);
  tau ~ gamma(2, 2);
  y ~ poisson_log(X * beta + phi + log_offset);
}

See here for a more recent effort at building CAR models in Stan:

Connor Donegan. “Building spatial conditional autoregressive models in the Stan pro-
gramming language”. OSF Preprint: https://osf.io/3ey65/.

Convenience functions (prep_car_data; stan_car) here: Bayesian Spatial Analysis • geostan

1 Like