Hi all, I am trying to specify a linear multivariate model and I’m having some difficulties: 1) I’m not recovering parameters correctly from a simulated data set and 2) I need to speed it up for my real data.
First here is a simulated data script: gen_test_data_y2_x3.R (530 Bytes)
And here is the script to call stan from r: run_test_model.R (311 Bytes)
My model is based on section 9.15 of the Stan manual (save to linear_multivariate2.stan):
int<lower=1> NvarsY;
int<lower=1> NvarsX;
int<lower=0> N;
vector[NvarsX] x[N];
vector[NvarsY] y[N];
}
parameters {
matrix[NvarsY, NvarsX] beta;
cholesky_factor_corr[NvarsY] L_Omega;
vector<lower=0>[NvarsY] L_sigma;
}
model {
vector[NvarsY] mu;
matrix[NvarsY, NvarsY] L_Sigma;
//Priors
L_Sigma = diag_pre_multiply(L_sigma, L_Omega);
to_vector(beta) ~ normal(0, 5);
L_Omega ~ lkj_corr_cholesky(4);
L_sigma ~ cauchy(0, 2.5);
//Model
for (n in 1:N){
mu = beta * x[n];
}
y ~ multi_normal_cholesky(mu, L_Sigma);
}
My recovered parameters from running this were:
summary(params$beta[,1,])
V1 V2 V3
Min. :-3.953 Min. :-18.506 Min. :-13.2165
1st Qu.: 3.760 1st Qu.: -4.397 1st Qu.: -0.1258
Median : 5.483 Median : -1.051 Median : 2.8229
Mean : 5.465 Mean : -1.053 Mean : 2.8344
3rd Qu.: 7.095 3rd Qu.: 2.287 3rd Qu.: 5.7141
Max. :14.481 Max. : 18.401 Max. : 20.0149
summary(params$beta[,2,])
V1 V2 V3
Min. : 0.7665 Min. :-19.278 Min. :-11.595
1st Qu.: 9.0719 1st Qu.: -5.228 1st Qu.: 2.816
Median :10.7551 Median : -2.049 Median : 5.885
Mean :10.7435 Mean : -2.046 Mean : 5.835
3rd Qu.:12.4121 3rd Qu.: 1.115 3rd Qu.: 8.754
Max. :20.2864 Max. : 15.609 Max. : 24.415
… which don’t match the correlations I specified in the generated data.
My second problem is that for my real dataset (~2000 rows, 23 Y variables, 40 X variables) it takes about 24 hours to run. Perhaps I’m simply expecting too much giving the dimensionality of the data and not huge number of rows. I can reduce the number of X variables conceivably based on prior knowledge, but there would still be perhaps 20 X variables, and the number of Y variables are fixed.