I am trying to fit a spatial error model using Rstan. Currently, I am following the tutorial from this tutorial (please see 3.2.4 A Spatial Model) but since the example from the tutorial only includes 1 predictor, I am trying to extend the code for multiple predictors here by myself, using my own dataset. I am sorry I cannot provide the dataset since it is confidential for my thesis. It has 3515 observations and 13 predictors (including intercept). The dataframe class is a SpatialPointsDataFrame in R.
This is the full statistical model I want to obtain:
Y=\beta_{0} + \beta_{n} X_{n} + u with n = 2 to 13 predictors.
u=\lambda Wu+\epsilon
Here’s what I’ve done so far:
Stan code
data {
// Define variables in data
// Number of observations
int<lower=0> N;
// Number of fixed effect parameters
int<lower=0> p;
// Dependent var
vector[N] lnPrice;
// Independent var
vector[N] BCR ;
vector[N] FAR ;
vector[N] lnStDist ;
vector[N] lnCBDdist ;
vector[N] lnSubdist ;
vector[N] lnParkdist ;
vector[N] lnBusdist ;
vector[N] Logsum ;
vector[N] Parkratio ;
vector[N] Comratio ;
vector[N] Comdiv ;
vector[N] Busdiv ;
// Spatial matrix of neighbors
matrix<lower=0>[N,N] W;
// Error term?
vector<upper=1>[N] e;
// Matrix of the structure of error?
matrix<lower=0,upper=1>[N,N] I;
}
parameters {
// Define parameters to estimate
// Fixed effects
real beta[p];
// Error term
real<lower = 0> sigma;
// Spatial error coefficient
real<lower=-1,upper=1> lambda;
}
transformed parameters {
// Individual mean
vector[N] mu;
// Individual mean definition
for (i in 1:N) {
mu[i] <- beta[1] +
BCR[i]*beta[2] + FAR[i]*beta[3] + lnStDist[i]*beta[4] +
lnCBDdist[i]*beta[5] + lnSubdist[i]*beta[6] + lnParkdist[i]*beta[7] +
lnBusdist[i]*beta[8] + Logsum[i]*beta[9] + Parkratio[i]*beta[10] +
Comratio[i]*beta[11] + Comdiv[i]*beta[12] + Busdiv[i]*beta[13];
}
}
model {
lnPrice ~ multi_normal_prec(mu, crossprod(I - lambda * W)/(sigma*sigma));
}
R code
dat.err <- list(N = length(SHP_land$lnPrice),
p = 13,
BCR = SHP_land$BCR,
FAR = SHP_land$FAR,
lnStDist = SHP_land$lnStDist,
lnCBDdist = SHP_land$lnCBDdist,
lnSubdist = SHP_land$lnSubdist,
lnParkdist = SHP_land$lnParkdist,
lnBusdist = SHP_land$lnBusdist,
Logsum = SHP_land$Logsum,
Parkratio = SHP_land$Parkratio,
Comratio = SHP_land$Comratio,
Comdiv = SHP_land$Comdiv,
Busdiv = SHP_land$Busdiv,
I = diag(N) ,
W = nb2mat(nb3km) , #spatial weight matrix
e = rep(1,N) ,
lnPrice = SHP_land$lnPrice)
# Run the model
fit.err = stan(model_code = str.err, data = dat.err,
chains = 2, iter = 5000, warmup = 1000, thin = 10,
control = list(max_treedepth = 15))
The code actually works, and it is sampling for the chain 1. The only problem is it takes too long to sample (2.68478e+006 seconds is unusually long!). The log message I receive in my console after 20 hours is this:
SAMPLING FOR MODEL 'f37b23ddd61543a3245bf1aedba771ca' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 268.478 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.68478e+006 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 5000 [ 0%] (Warmup)
I wonder if I could do something to make the iteration runs faster by changing the code or something (is my code correct or there is something wrong?), because I think this is quite a simple model. I wish to receive some directions about this matter. Thank you for your attention!