Hi all, I am trying to specify a linear multivariate model and I’m having some difficulties: 1) I’m not recovering parameters correctly from a simulated data set and 2) I need to speed it up for my real data.

First here is a simulated data script: gen_test_data_y2_x3.R (530 Bytes)

And here is the script to call stan from r: run_test_model.R (311 Bytes)

My model is based on section 9.15 of the Stan manual (save to linear_multivariate2.stan):

```
int<lower=1> NvarsY;
int<lower=1> NvarsX;
int<lower=0> N;
vector[NvarsX] x[N];
vector[NvarsY] y[N];
}
parameters {
matrix[NvarsY, NvarsX] beta;
cholesky_factor_corr[NvarsY] L_Omega;
vector<lower=0>[NvarsY] L_sigma;
}
model {
vector[NvarsY] mu;
matrix[NvarsY, NvarsY] L_Sigma;
//Priors
L_Sigma = diag_pre_multiply(L_sigma, L_Omega);
to_vector(beta) ~ normal(0, 5);
L_Omega ~ lkj_corr_cholesky(4);
L_sigma ~ cauchy(0, 2.5);
//Model
for (n in 1:N){
mu = beta * x[n];
}
y ~ multi_normal_cholesky(mu, L_Sigma);
}
```

My recovered parameters from running this were:

summary(params$beta[,1,])

V1 V2 V3

Min. :-3.953 Min. :-18.506 Min. :-13.2165

1st Qu.: 3.760 1st Qu.: -4.397 1st Qu.: -0.1258

Median : 5.483 Median : -1.051 Median : 2.8229

Mean : 5.465 Mean : -1.053 Mean : 2.8344

3rd Qu.: 7.095 3rd Qu.: 2.287 3rd Qu.: 5.7141

Max. :14.481 Max. : 18.401 Max. : 20.0149

summary(params$beta[,2,])

V1 V2 V3

Min. : 0.7665 Min. :-19.278 Min. :-11.595

1st Qu.: 9.0719 1st Qu.: -5.228 1st Qu.: 2.816

Median :10.7551 Median : -2.049 Median : 5.885

Mean :10.7435 Mean : -2.046 Mean : 5.835

3rd Qu.:12.4121 3rd Qu.: 1.115 3rd Qu.: 8.754

Max. :20.2864 Max. : 15.609 Max. : 24.415

… which don’t match the correlations I specified in the generated data.

My second problem is that for my real dataset (~2000 rows, 23 Y variables, 40 X variables) it takes about 24 hours to run. Perhaps I’m simply expecting too much giving the dimensionality of the data and not huge number of rows. I can reduce the number of X variables conceivably based on prior knowledge, but there would still be perhaps 20 X variables, and the number of Y variables are fixed.