Thank you the replies and thinking with me! You have certainly given me something to think about, I am learning a lot about REML, Bayesian and stan.
The way I understand it now the FA model attempts to summarize the covariance matrix of the data \Sigma_d in lambda and psi. The estimates of ASreml seem to do a better job at that: \Sigma_{as} = \Lambda \Lambda' + \Psi \approx \Sigma_d where the error between \Sigma_d and \Sigma_{as} is lower than that of the Bayesian counterpart (\Sigma_{ba}). I am not sure if over-fitting is a problem here … Thanks for the tip on adapt_delta, in some model variations stan gave me errors for this.
Your point about the priors is interesting. The simple simulation you showed made me think about the priors. The lambda values shown below are typical as far as I know, so these priors do seem to give too much weight to excessively high values. Currently I work with small data sets, about ~200 observations. But in reality the number of observations can go in the thousands. Yet I’ve not seen such (>10) estimates for lambda.
Continuing on the priors: Imagine matrix Lambda with k = 2. Currently the L_t estimates in column 1 have a relation with the L_t estimates in column 2. This is not so evident to me, even though we expect the two vectors to ‘reconstruct’ \Sigma_d . An example based on agridat yan.winterwheat data in R:
model{
mu_lt ~ normal(0, 1);
sigma_lt ~ normal(0,1);
L_t ~ normal(mu_lt, sigma_lt);
}
> Lambda
[,1] [,2]
[1,] 0.4935566 0.00000000
[2,] 0.3704199 0.19782069
[3,] 0.3504931 0.23606015
[4,] 0.3378872 0.20372654
[5,] 0.5061964 0.03497362
[6,] 0.2353057 0.26467081
[7,] 0.3704607 0.06143004
[8,] 0.1245585 0.25115701
[9,] 0.2765927 0.28534212
Note that L[5,2] and L[7,2] are very close to 0. ASreml estimates these to be negative values. So I tried an ad-hoc solution.
model{
for(c in 1:M){
L_t[c] ~ normal(0, 1);
} // M = length(L_t), so each value in L_t has its own prior N(0,1)
}
> Lambda
[,1] [,2]
[1,] 0.6525018 0.0000000
[2,] 0.5152577 0.1321970
[3,] 0.4909657 0.1899327
[4,] 0.4532226 0.1757425
[5,] 0.7614079 -0.3423222
[6,] 0.2913932 0.3467770
[7,] 0.5523000 -0.3473909
[8,] 0.1154314 0.3891267
[9,] 0.3600153 0.3501128
Now L[5,2] and L[7,2] are allowed to be negative. I don’t want to say that any of these priors are better than others but I find the sign-switching interesting. The first result would imply that L5/7 are very weak but in “reality” they are strong in an opposite direction.
So I will take some time to wrestle with the priors…! I am starting to wonder if any prior can be considered ‘weak’ in this type of model.