 # Specification for spatial regression when working with panel data in STAN

Dear All,

I am trying to fit a random-effects stochastic frontier model (basically same as a regression but with inefficiency component u) with spatial element based on the specification in the link https://rstudio-pubs-static.s3.amazonaws.com/243277_01730c1f0a984132bce5d5d25bec62aa.html

I have 20 units with each of the units being observed 32 times, which gives me a grand total of 640 observations.

I am trying to model the spatial relationship for the standard deviation of the inefficiency component. My spatial weights matrix is 20 x 20, but of course, when using multi_normal distribution it is demanding that my spatial weights matrix be 640 x 640, which is of course not needed as it will be 31 repetitions for each unit.

I am sure that there is a way to code the 20 x 20 matrix into the multi_normal distribution so it knows that unit id ‘1’ in weights matrix relates to 32 ‘1s’ in the data.

Here is my code and the error I am getting says "multi_normal_lpdf: Size of random variable (640) and rows of covariance parameter (20) must match in size "

Any help will be greatly appreciated:

``````data {
int<lower=0> N;// Number of observations
int<lower=0> J; // Number of groups
int<lower=0> P; // Number of predictors in the regression
real Y[N]; // dependent variables
matrix[N,P] X;
int<lower=1,upper=J> dhb_id[N]; // specifying 20 groups for random effects
matrix<lower=0>[J,J] W; // 20 x 20 Spatial weight matrix
matrix<lower=0,upper=1>[J,J] I; // 20 x 20 identity matrix
}
parameters {
real alpha;
vector[P] beta;
real<lower=0> sigma_e;
real<lower=0> sigma_u; // SD of variance
real<lower=-1,upper=1> lambda; // measures the spatial correlation
vector<lower = 0>[N]mean_u;
vector<lower = 0>[N]u; // inefficiency component
vector[J]RE; // vector capturing random effects
}
transformed parameters {
vector[N]yhat;
yhat = alpha + X*beta +u + RE[dhb_id] ;
}
model {
sigma_e~exponential(1);
alpha ~ normal(0,1);
beta ~ normal (0,2);
lambda ~ normal (0,2);

sigma_e~exponential(1);
mean_u ~ normal(0,2);
u ~ multi_normal(mean_u, **tcrossprod(I -  lambda * W)*(sigma_u*sigma_u)**);

RE ~ normal(0,2);
Y ~ normal(yhat, sigma_e);
}
``````

With much gratitude
Antony

Hey Antony,

I think you have two options. One is using the kronecker product to create a block diagonal matrix. First in R make an identity matrix `It = diag(31)` then `Wt = kronecker(It, W)`. That should give you a 640 x 640 matrix. Then for your Stan data you’ll provide `W = Wt` and `I = diag(640)`. I think that should give you independent groups of parameters `u` that share the same `lambda` and `sigma_u` parameters. Double check my thinking, would that work for you? The downside is that this might make the sampling issues even worse for the multivariate normal, this is a slow model to fit in Stan.

Second option would be to loop over each group of 20 observations inside the transformed parameters and model blocks in Stan. I’m less helpful on this point right now and I’m thinking that might get messy trying to index the observations correctly, but maybe that’s enough to get you started, or maybe someone else can jump in to help. This option might sample more quickly but requires more work

1 Like

Thank you for the reply. I am still working on it. Will post here, how it went.
Thanks once again.

Hi Connor,

Hope you are doing well.
I managed to create a 640 x 640 matrix and this is my code. But unfortunately, it is taking a very long time to run.
This is the code:

data {
int<lower=0> N;// Number of observations
int<lower=0> J; // Number of groups
int<lower=0> P; // Number of predictors in main regression
vector[N] Y; // dependent variables
matrix[N,P] X;
int<lower=1,upper=J> dhb_id[N]; // Number of groups
matrix<lower=0>[N,N] W;// 640 x 640 spatial matrix
matrix<lower=0,upper=1>[N,N] I; // 640 x 640 Identity matrix

}

parameters {
real alpha;
vector[P] beta;
real<lower=0> sigma;
real<lower=-1,upper=1> lambda;
real<lower=0> sigma_r;
vector[J] re;

}

transformed parameters {
vector[J] r ;
vector[N]mu ;
r = re * sigma_r;
mu = alpha+ X*beta + r[dhb_id];

}

model {
re ~ normal (0,2);
sigma_r ~cauchy(0, 1);
sigma~cauchy(0, 1);
alpha ~ normal(0,2);
beta ~ normal (0,2);
lambda ~ normal (0,2);

Y ~ multi_normal_prec(mu, tcrossprod(I - lambda * W)/(sigma * sigma));

}
generated quantities {

}

this is coming up with
SAMPLING FOR MODEL ‘test’ NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 2.232 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 22320 seconds.
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 1000 [ 0%] (Warmup)

Antony

Hey @ants007,

Unfortunately I don’t think anyone’s made an efficient implementation of those models, the multivariate normal is the super slow part.

Are you open to other model specifications? A spatial filtering model would probably be a lot faster and this paper provides some direct justification for substituting out the SAR you have for ESF. If you’re interested I can provide some code to make it a little easier

open access version:

https://osf.io/fah3z/

Hi Connor,

Thanks once again for following up.
Yes sure, I will be interested in learning more about ‘spatial filtering’ . I clicked the link , but it is asking me for the ID and password.

Sure, if you wont mind sharing the code.

Thanks once again.
Antony

Sorry bout that, I updated the post with an open access version

Thanks Connor. I will take a look :)
Antony