# Rstan - Spatial Error Model with Multiple Predictors

I am trying to fit a spatial error model using Rstan. Currently, I am following the tutorial from this tutorial (please see 3.2.4 A Spatial Model) but since the example from the tutorial only includes 1 predictor, I am trying to extend the code for multiple predictors here by myself, using my own dataset. I am sorry I cannot provide the dataset since it is confidential for my thesis. It has 3515 observations and 13 predictors (including intercept). The dataframe class is a SpatialPointsDataFrame in R.
This is the full statistical model I want to obtain:
Y=\beta_{0} + \beta_{n} X_{n} + u with n = 2 to 13 predictors.
u=\lambda Wu+\epsilon
Here’s what I’ve done so far:

Stan code

data {
// Define variables in data
// Number of observations
int<lower=0> N;
// Number of fixed effect parameters
int<lower=0> p;

// Dependent var
vector[N] lnPrice;

// Independent var
vector[N] BCR ;
vector[N] FAR ;
vector[N] lnStDist ;
vector[N] lnCBDdist ;
vector[N] lnSubdist ;
vector[N] lnParkdist ;
vector[N] lnBusdist ;
vector[N] Logsum ;
vector[N] Parkratio ;
vector[N] Comratio ;
vector[N] Comdiv ;
vector[N] Busdiv ;

// Spatial matrix of neighbors
matrix<lower=0>[N,N] W;

// Error term?
vector<upper=1>[N] e;

// Matrix of the structure of error?
matrix<lower=0,upper=1>[N,N] I;
}

parameters {
// Define parameters to estimate
// Fixed effects
real beta[p];

// Error term
real<lower = 0> sigma;

// Spatial error coefficient
real<lower=-1,upper=1> lambda;
}

transformed parameters  {
// Individual mean
vector[N] mu;

// Individual mean definition
for (i in 1:N) {
mu[i] <- beta +
BCR[i]*beta + FAR[i]*beta + lnStDist[i]*beta +
lnCBDdist[i]*beta + lnSubdist[i]*beta + lnParkdist[i]*beta +
lnBusdist[i]*beta + Logsum[i]*beta + Parkratio[i]*beta +
Comratio[i]*beta + Comdiv[i]*beta + Busdiv[i]*beta;
}
}

model {
lnPrice ~ multi_normal_prec(mu,  crossprod(I - lambda * W)/(sigma*sigma));
}


R code

dat.err <- list(N           = length(SHP_land$lnPrice), p = 13, BCR = SHP_land$BCR,
FAR         = SHP_land$FAR, lnStDist = SHP_land$lnStDist,
lnCBDdist   = SHP_land$lnCBDdist, lnSubdist = SHP_land$lnSubdist,
lnParkdist  = SHP_land$lnParkdist, lnBusdist = SHP_land$lnBusdist,
Logsum      = SHP_land$Logsum, Parkratio = SHP_land$Parkratio,
Comratio    = SHP_land$Comratio, Comdiv = SHP_land$Comdiv,
Busdiv      = SHP_land$Busdiv, I = diag(N) , W = nb2mat(nb3km) , #spatial weight matrix e = rep(1,N) , lnPrice = SHP_land$lnPrice)

# Run the model
fit.err = stan(model_code = str.err, data = dat.err,
chains = 2, iter = 5000, warmup = 1000, thin = 10,
control = list(max_treedepth = 15))


The code actually works, and it is sampling for the chain 1. The only problem is it takes too long to sample (2.68478e+006 seconds is unusually long!). The log message I receive in my console after 20 hours is this:

SAMPLING FOR MODEL 'f37b23ddd61543a3245bf1aedba771ca' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 268.478 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.68478e+006 seconds.
Chain 1:
Chain 1:
Chain 1: Iteration:    1 / 5000 [  0%]  (Warmup)


I wonder if I could do something to make the iteration runs faster by changing the code or something (is my code correct or there is something wrong?), because I think this is quite a simple model. I wish to receive some directions about this matter. Thank you for your attention!

Sorry, can’t really dive deep into your problem, just a few suggestions, hope they help:

• According to folk theorem, a slow computation usually means a problematic model. Your dataset is not so big so I would guess there is some problem.
• You are missing priors for all your parameters - this might be a huge issue.
• Maybe the data are fit poorly by your model. Checking whether the model is a good choice might be done on a subset of the data for quicker fits and then use posterior predictive checks
• It might also make sense to simulate a small dataset that exactly conforms to your model - can you recover the true parameters used in the simulation?
• Is the model the same as in the CAR case study - a quick glance looks like it is, but please check.
• If the model is CAR/ICAR, you might use brms (https://rdrr.io/cran/brms/man/cor_car.html) which generally has nice and optimized implementations
• Even if the models in brms do not suit your case exactly, you might use make_stancode and see how the model is implemented and steal some tricks

Best of luck!

4 Likes

Yes, I think I should start setting the prior for my model to make a better fit for my data (I think this is the first thing I need to study now). I’m also looking for the brms package currently. The model has some similarities from CAR (since they are considering autocorrelation), yet different in some sense. But a quick look on brms suggests they also provide a function for spatial error model in cor_errorsar .
Thank you for the insightful comments!