I am attempting to simulate a Gage R&R study using stan. In this study there are parts and operators, as well as part-operator interactions. For this experiment, I am generating the data with known parameters and attempting to extract those parameters using stan. The random effects model I am fitting is shown below:
y_{ijk} ~ \mu + \alpha_i + \beta_j + \alpha\beta_{ij} + \epsilon_{ijk}
where
\alpha_i ~ \mathcal{N}(0,\sigma_\alpha) … (i= 1, 2, …, nO)
\beta_j ~ \mathcal{N}(0,\sigma_\beta) … (j= 1, 2, …, nP)
\alpha\beta_{ij} ~ \mathcal{N}(0,\sigma_{\alpha\beta}) … (ij= 1, 2, …, nOP)
\epsilon_{ijk} ~ \mathcal{N}(0,\sigma_R) … (ijk= 1, 2, …, N)
My goal is to obtain parameter estimates for \sigma_\alpha,\sigma_\beta,\sigma_{\alpha\beta}, and \sigma_R.
I generate data with [\sigma_\alpha, \sigma_\beta, \sigma_{\alpha\beta}, \sigma_R] = [0.03, 0.8, 0.002, 0.04] where [nO, nP, nOP, N] = [3, 10, 30, 90]. The data is attached here:
stan_data.txt (1.6 KB)
My stan code for this model is below:
data {
int<lower=1> N;
int<lower=1> nP;
int<lower=1> nO;
int<lower=1> nOP;
int oper[N];
int part[N];
int inter[N];
real y[N];
}
parameters {
vector[nO] O;
vector[nP] P;
vector[nOP] OP;
real<lower=0> sigma_O;
real<lower=0> sigma_P;
real<lower=0> sigma_OP;
real<lower=0> sigma_R;
}
transformed parameters {
vector[N] theta = sigma_O*O[oper] + sigma_P*P[part] + sigma_OP*OP[inter];
}
model {
sigma_O ~ gamma(0.01, 0.01);
sigma_P ~ gamma(0.01, 0.01);
sigma_OP ~ gamma(0.01, 0.01);
sigma_R ~ gamma(0.01, 0.01);
O ~ normal(0, 1);
P ~ normal(0, 1);
OP ~ normal(0, 1);
y ~ normal( theta, sigma_R );
}
When I pass the data to stan, I set y = y - y.mean().
The results I achieve are below:
sO: 0.00035
sP: 1.0639
sOP: 0.00020
sR: 0.04932
I also generated a similar dataset with the same values of sigmas, but with [nO, nP, nOP, N] = [16, 16, 256, 1280].
The results of that are shown below:
sO: 0.61582
sP: 0.20466
sOP: 8.8469\bullet10^{-5}
sR: 0.04155
The current model takes between 25 and 40 minutes to sample (4 chains, 2000 iter, 800 warmup).
How can I get the predictions to be more accurate? And how can I get the model to run faster?
Is there a better way for me to formulate my model?