I’m working with a model that has multiple pairs of correlated observations, and so am estimating multiple multi_normal_cholesky
distributions. I’m also estimating the variances for these observations, and so am using the diag_pre_multiply(variance, cholesky)
approach.
When using just the cholesky factor (and not scaling with the variance), the chains mix well enough and runs in a reasonable time (~30secs).
However, when using the scaled cholesky factors, the model runs 10x slower (~350secs) and the chains are almost completely stationary.
I’ll paste the full model below, as well as attach some data that reproduces the issue. If there was some insight as to where this was coming from, it would be greatly appreciated.
data.R (13.1 KB) NonCentRes.stan (2.8 KB)
The relevant part is this section:
for(c in 1:(T*(J-1))) {
matrix[2,2] y_cov[J*(T-1)];
y_cov[c] = diag_pre_multiply(resvar[{c,J+c}],y_Lcorr[c]);
y[,{c,J+c}] ~ multi_normal_cholesky(eta_vec[,{c,J+c}], y_cov[c]);
}
The model works without any issues if I use y_Lcorr[c]
instead of y_cov[c]
in the multi_normal_cholesky
call
The full model is:
data{
int N; //Number of individuals
int J; //Number of items
int T; //Number of timepoints
vector[J*T] y[N];
}
parameters{
row_vector[J*T] intercepts;
/* 'Raw' Variables for Non-Centered Parameterisation */
matrix<lower=0.0>[J,T] load_r;
matrix[N,T] eta_r;
/* Hyperparameters for Loadings */
vector<lower=0>[T] load_m;
vector<lower=0>[T] load_sd;
/* Correlations between factors over time,
item residual variances, and residual correlations */
cholesky_factor_corr[T] e_Lcorr;
vector<lower=0>[J*T] resvar;
cholesky_factor_corr[2] y_Lcorr[J*(T-1)];
}
transformed parameters{
matrix[T,J] load;
/* Transform 'raw' loadings to required distribution */
for(j in 1:J)
load[,j] = (load_m[j] + load_r[j] * load_sd[j])';
}
model{
/* These are temporary variables, used for holding
functions of other parameters. More efficient on
memory to include here as their posteriors aren't saved */
matrix[J*T, T] lam;
matrix[N,T] eta_t;
matrix[N,J*T] lam_eta;
row_vector[J*T] eta_vec[N];
/* Priors for factor and residual correlations */
e_Lcorr ~ lkj_corr_cholesky(1);
for(c in 1:(T*(J-1)))
y_Lcorr[c] ~ lkj_corr_cholesky(1);
/* Priors for intercepts and loading hyperparameters */
intercepts ~ normal(0,5);
load_m ~ normal(0,5);
load_sd ~ normal(0,0.1);
/* Priors for 'raw' loadings for non-centered parameterisation */
to_vector(load_r) ~ std_normal();
/* Pack loadings into a factor loading matrix with zeros
for cross-loadings */
lam[,1] = append_row(load[1]', rep_vector(0, J*(T-1)));
lam[,T] = append_row(rep_vector(0, J*(T-1)), load[T]');
for(t in 2:(T-1))
lam[,t] = append_row(append_row(rep_vector(0, J*(t-1)), load[t]'),
rep_vector(0, J*(T-t)));
/* Priors for residual variances */
resvar ~ std_normal();
/* Priors for 'raw' factor scores and transform with multivariate
non-centered parameterisation */
to_vector(eta_r) ~ std_normal();
for(n in 1:N)
eta_t[n] = (e_Lcorr*eta_r[n]')';
/* Create the E[Y] for each individual */
lam_eta = eta_t*lam';
for(n in 1:N)
eta_vec[n] = lam_eta[n] + intercepts;
/* Link to observed data, with a multivariate normal distribution
for each pair of correlated observations */
for(c in 1:(T*(J-1))) {
matrix[2,2] y_cov[J*(T-1)];
y_cov[c] = diag_pre_multiply(resvar[{c,J+c}],y_Lcorr[c]);
y[,{c,J+c}] ~ multi_normal_cholesky(eta_vec[,{c,J+c}], y_cov[c]);
}
}
generated quantities {
/* Transform factor correlations from Cholesky to Correlation matrix */
corr_matrix[T] corr = multiply_lower_tri_self_transpose(e_Lcorr);
/* Put variance parameters into variance scale (from SD) */
vector[J*T] vars = square(resvar);
vector[T] loadvars = square(load_sd);
}