# Regression common factor model in stan

I am interested in fitting a single common factor model in stan using multilevel regression rather than a multivariate approach (see Factor analysis - Wikipedia for background).

Data Generating Model

Suppose each participant i has some unobserved “true” factor score (\theta_i). The goal is to estimate these using a set of J observed scores (y_{1i}, y_{2i}, ... , y_{Ji}) measured for each participant.

For each observed score for participant i’s on item j (y_{ij}), I generate data from the following factor model, where \lambda_1, \lambda_2, ..., \lambda_J are parameters we want to estimate. I’ve scaled the responses on each item (y_{j}) so that they have a mean of 0 and SD of 1.

For identification i assume that the true scores follow a standard normal distribution

\theta_i \sim \text{Normal}(0,1)

y_{ij} = \lambda_j \times \theta_i + \epsilon_{ij}

Also assume that \sigma^2_{y_{ij}} = 1, \sigma^2_{\theta_{j}} = 1, cov(\theta_{j}\epsilon_{ij})= 0

\sigma^2_{y_{ij}} = 1 =\lambda_j^2 + \sigma^2_{\epsilon_{ij}}
\sigma^2_{\epsilon_{ij}} = 1 - \lambda_j^2

Stan Model

I am using the following hierarchical model:

\theta_i \sim \text{Normal}(0,1)
y_{ij} \sim \text{Normal}(\lambda_j \theta_i , 1 - \lambda_j^2 )

data {
int<lower=0> n;
int<lower=0> pps_n;
int<lower=0> item_n;
array[n] int<lower=1, upper=item_n> item;
array[n] int<lower=1, upper=pps_n> pps;
vector[n] y;
}

parameters {
vector[pps_n] theta;                      // true scores
}

model {
vector[n] mu;
vector[n] e_sd;
theta ~ normal(0, 1);
//lambda ~ beta(.65,1);
for (i in 1:n){
mu[i] = lambda[item[i]] * theta[pps[i]];
e_sd[i] = sqrt(1 - lambda[item[i]]^2);
}
y ~ normal(mu, e_sd);
}
}


The sample estimates seem to converge on my simulated parameters when using a large simulated sample size, but I keep getting errors (Warning: 7 of 7 chains had an E-BFMI less than 0.2.) - so i am wondering if my model could be reparameterized in some way? I’m worried there is something i have perhaps overlooked.

I can find guidance online on fitting item response theory models or factor models using the multivariate normal distribution, but not quite this model.

Example Data

> dat
$n [1] 200$pps_n
[1] 50

$item_n [1] 4$item
[1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
[85] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
[169] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

$pps [1] 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11 11 11 11 12 12 12 12 13 13 13 13 14 14 14 14 [57] 15 15 15 15 16 16 16 16 17 17 17 17 18 18 18 18 19 19 19 19 20 20 20 20 21 21 21 21 22 22 22 22 23 23 23 23 24 24 24 24 25 25 25 25 26 26 26 26 27 27 27 27 28 28 28 28 [113] 29 29 29 29 30 30 30 30 31 31 31 31 32 32 32 32 33 33 33 33 34 34 34 34 35 35 35 35 36 36 36 36 37 37 37 37 38 38 38 38 39 39 39 39 40 40 40 40 41 41 41 41 42 42 42 42 [169] 43 43 43 43 44 44 44 44 45 45 45 45 46 46 46 46 47 47 47 47 48 48 48 48 49 49 49 49 50 50 50 50$y
[1]  0.026993060  0.294678438  0.504879733  0.838063411 -0.064088946  2.135580858 -0.102658228  0.431988262 -0.637015147  0.829512463 -0.996151098 -1.172257558
[13] -0.547533963 -0.296305725 -0.931871638  0.132416422  1.642639759  0.194497232  1.118328557  0.956495918 -0.310032813  0.515213705 -1.037700683  1.393111976
[25] -0.066293502  0.797577852 -0.935548272 -0.719678258 -0.142148393 -0.707048459  1.064033885 -1.085824814 -0.811128998 -0.518701709 -1.245711539 -1.242911556
[37] -0.037287432  0.338006972  0.065670432 -0.481413775  0.739121455 -1.549421883  1.592945823  0.156952157  0.675959428 -0.575613220  0.100065338  1.144788985
[49]  0.203182220  0.016312288  1.094622448 -1.648530696  0.230619598 -2.295092814  0.380279399 -0.315248921 -0.331125608 -0.687076351  0.404710970 -0.038534007
[61]  0.281532518  0.715078091 -1.425030335 -1.151016981 -0.187804492  0.776883927  1.076076376  1.690008957  0.697057058  1.165423111 -1.997176896 -0.831881308
[73] -0.455523243 -0.584201492  0.393350755 -0.255847690 -1.040633917 -1.101931637  0.661327615 -0.883798199  0.716656937  0.211037794 -1.563880433  0.966980798
[85] -0.582617281 -0.399425564 -0.060571701  1.473233891  0.227961969  2.343480726 -1.686202457  0.015734558  1.114514997 -0.712103496  0.031194180  1.446668581
[97] -1.673851859  1.227383369  0.796922764 -0.973365207  0.605591599 -0.101965655 -0.727886304 -1.206806676 -0.556067045  0.858942458  0.921756082 -1.163805082
[109]  1.859318210  1.038252385  1.089760225  1.609051927 -2.118524593 -2.309381226 -1.139945318  0.768823310  0.179539302 -0.062310755 -0.067278419  0.088842584
[121]  0.874852839  1.512254586  0.757874720  0.947273258 -0.813710985 -0.403531497 -0.136977139 -0.997688339  0.483562032 -0.867146646 -0.422424119  0.186733548
[133] -0.824017861 -1.179733762 -0.005953189  0.126401855  1.163641753  0.801800304 -0.043334431  1.568972827 -1.371857977  0.154702529  1.117423071 -0.521789561
[145]  2.946917504  0.252095390  0.365654201 -0.441102137 -1.433798849 -0.351040775 -1.086362052 -1.785369041 -0.124113300  0.498320681  1.791852973  0.506375434
[157] -0.628787515  0.800924879 -0.665474402  0.889657871 -0.238493159  1.376453226  0.571182247  0.034757974  1.737456561  0.606837749  0.678950573  1.137989338
[169] -0.859358493 -0.288164957 -0.515627109 -1.512983638 -1.763944833  0.103986638 -0.326272991  0.057044237 -1.310907039  0.294874018  0.156111141 -1.648735228
[181]  0.301116343 -1.261611563 -0.172779808 -0.064322524 -0.093500569 -1.024119434  0.626138072  1.386070237  1.404419811 -0.387153066  2.363686265 -0.718435361
[193]  0.810834107 -1.543641639 -2.269518803  0.682420125  0.100678757 -0.653388345 -0.162460478  0.224488114


I couldn’t quite line the math up with x[i, j] and the data in the actual model.

There are a few things that could cause a problem in the implementation:

• The lambda * theta term introduces a multiplicative non-identifiability. You can move around terms in theta and move around terms in lambda to match. Only the prior on theta is identifying here. This can work, but it may take a lot of iterations to converge. Probably no getting around this in a factor model. But you might want to see what the ESS looks like for these parameters.

• The second issue is that y is constrained to have an error scale in (0, 1). Is mu being estimated close enough to y that this makes sense? I’d suggest storing the residuals y - mu as a generated quantity and then inspecting the residuals to make sure this is well specified. I didn’t see the distribution of epsilon in the math, so I wasn’t quite sure what was intended here. Given that theta isn’t constrained, I don’t see anything constraining errors to unit scale.

• The edited out prior on lambda could be problematic in that it will try to push estimates to the corners of probability space with a beta parameter < 1, and the mean is probably not what you want at .65 / 1.65 if item_n is greater than 3.

You can also replace the first two and last five lines of the model block with a one-liner.

y ~ normal(lambda[item] .* theta[pps], sqrt(1 - square(lambda[item])));


Or if you want to try to make this clearer, define the intermediates as one-liners.

vector[n] mu = lambda[item] .* theta[pps];
vector[n] sigma = sqrt(1 - square(lambda[item]));
y ~ normal(mu, sigma);


(Sorry, but I couldn’t help converting e_sd to sigma. I like the pairing of mu/sigma and location/scale or mean/sd in the case of a normal, but I find it confusing when they’re mixed.

Thanks for the advice! I will take a look at the things you mentioned when I’m back in the office. For now, I’ve clarified the question formulas to use y rather than x to match the stan model I linked.