Hi all I have a panel data set about county-level corn yield. The dependent variable y is log(cornyield) and independent variables X are climate variables including precipitation and growing degree days, and time trend.
In the test dataset, I have 102 counties and 3 years so n=102 and t=3.
The model would be y=Xβ+u. Assuming u is spatially autocorrelated, I have u=λWu+ϵ where λ is a scalar parameter indicating for spatial dependence and W is the row-standardized neighbor matrix, and assume ϵ is i.i.d. normal.
Considering that u=[(I_n-λW)]^(-1) ϵ (where I_n
is the diagonal matrix) so we have ϵ=(I_n-λW)(y-Xβ)`. So my Stan code for the model is as below:
data {
int<lower = 1> n;
int<lower = 1> t;
int<lower = 1> p;
matrix[n, n] W;
matrix[n, n] In;
matrix[n,p] X[t];
vector[n] y[t];
}
parameters {
vector[p] beta;
real<lower=-1, upper=1> lamda;
real<lower=0> sigma;
}
model {
for (i in 1:t){
vector[n] X2;
X2 = (In-lamda*W)*(y[i]-X[i]*beta);
X2~normal(0, sigma);
}
}
spatialBayes2.stan (399 Bytes)
where n stands for the 102 regions, p stands for the 6 control variables, and t stands for the 3 years.
The data is as below:
data: 1st column the 102 regions with fips ID; 2nd column year, 3rd column is y and the rest are X.
testdata.csv (43.6 KB)
Row-standardized neighbor matrix:
w.csv (26.8 KB)
.
R code to read the data and Stan code:
testdata <- read.csv("testdata.csv", header = TRUE)
n=length(unique(testdata$fips))
t=length(unique(testdata$year))
X=testdata[,c(4:9)]
y=testdata[,3]
X2=array(unlist(X),c(102,t,6)) ## this is in (n,t,p)
X2=aperm(X2,c(2,1,3)) ## change into (t,n,p)
y2=array(unlist(y),c(t,102))
p=dim(X)[2]
W <- read.csv("w.csv",header = TRUE)
W <- W[,2:103]
In <- diag(1,dim(W)[1])
p=dim(X)[2]
FE_data <- list(n=n,
t=t,
p=p,
W=W,
In=In,
X=X2,
y=y2)
fit <- stan(file = "spatialBayes2.stan",
data = FE_data,
iter = 2000,
chains = 4)
The estimation results, however, is significantly different from some existing R packages using maximum likelihood. The did not agree in terms of sign. And I am getting a close-to-1 lambda (the spatial parameter).
The maximum likelihood estimates gives the following
fit2 <- spml(logcornyield~dday0C+dday29C+prec+precsq+t+t2,
data = allPanel_DM, listw = W2,
model="within",effect="individual",spatial.error="b",Hess=FALSE)
> summary(fit2)
Spatial panel fixed effects error model
Call:
spml(formula = logcornyield ~ dday0C + dday29C + prec + precsq +
t + t2, data = allPanel_DM, listw = W2, model = "within",
effect = "individual", spatial.error = "b", Hess = FALSE)
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-0.2498821 -0.0332716 0.0013296 0.0366884 0.2737303
Spatial error parameter:
Estimate Std. Error t-value Pr(>|t|)
rho 0.646942 0.051952 12.453 < 0.00000000000000022 ***
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
dday0C 0.907453 0.260309 3.4861 0.0004902 ***
dday29C -13.585637 2.198045 -6.1808 0.0000000006378 ***
prec -10.021127 5.937466 -1.6878 0.0914538 .
precsq 0.096969 0.045563 2.1282 0.0333182 *
t -1756.566054 884.931103 -1.9850 0.0471473 *
t2 66.003375 32.456722 2.0336 0.0419939 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
where the Stan gives
> print(fit)
Inference for Stan model: spatialBayes2.
8 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=8000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta[1] -0.17 0.00 0.12 -0.41 -0.25 -0.17 -0.09 0.07 4395 1
beta[2] -0.96 0.02 1.32 -3.55 -1.85 -0.96 -0.07 1.65 3066 1
beta[3] 2.46 0.05 3.10 -3.52 0.38 2.38 4.58 8.60 3662 1
beta[4] -0.02 0.00 0.02 -0.07 -0.04 -0.02 0.00 0.03 3631 1
beta[5] -554.59 8.69 481.66 -1499.97 -876.73 -562.52 -228.79 376.36 3070 1
beta[6] 19.46 0.32 17.62 -14.62 7.50 19.80 31.28 54.09 3063 1
lamda 0.98 0.00 0.02 0.94 0.97 0.98 0.99 1.00 6734 1
sigma 0.05 0.00 0.00 0.05 0.05 0.05 0.05 0.05 4999 1
lp__ 764.63 0.04 2.00 759.89 763.50 764.92 766.11 767.57 3013 1
Could anyone help me with it. The key parameter of interest if dday0C or beta[1], which should be positive as warm temperature benefit the crop growth. From this perspective, I think the ML estimate might be more sound. But I still want to see if how can I improve the Stan estimates. I am relatively new with Stan. Thank you in advance for the help!!