Random Effects Model

the_stan_man · April 24, 2020, 3:46am

I am attempting to simulate a Gage R&R study using stan. In this study there are parts and operators, as well as part-operator interactions. For this experiment, I am generating the data with known parameters and attempting to extract those parameters using stan. The random effects model I am fitting is shown below:

y_{ijk} ~ \mu + \alpha_i + \beta_j + \alpha\beta_{ij} + \epsilon_{ijk}
where
\alpha_i ~ \mathcal{N}(0,\sigma_\alpha) … (i= 1, 2, …, nO)
\beta_j ~ \mathcal{N}(0,\sigma_\beta) … (j= 1, 2, …, nP)
\alpha\beta_{ij} ~ \mathcal{N}(0,\sigma_{\alpha\beta}) … (ij= 1, 2, …, nOP)
\epsilon_{ijk} ~ \mathcal{N}(0,\sigma_R) … (ijk= 1, 2, …, N)

My goal is to obtain parameter estimates for \sigma_\alpha,\sigma_\beta,\sigma_{\alpha\beta}, and \sigma_R.

I generate data with [\sigma_\alpha, \sigma_\beta, \sigma_{\alpha\beta}, \sigma_R] = [0.03, 0.8, 0.002, 0.04] where [nO, nP, nOP, N] = [3, 10, 30, 90]. The data is attached here:
stan_data.txt (1.6 KB)

My stan code for this model is below:

data {
    int<lower=1> N;
    int<lower=1> nP;
    int<lower=1> nO;
    int<lower=1> nOP;

    int  oper[N];
    int  part[N];
    int  inter[N];

    real y[N];
}

parameters {
    vector[nO] O;
    vector[nP] P;
    vector[nOP] OP;

    real<lower=0> sigma_O;
    real<lower=0> sigma_P;
    real<lower=0> sigma_OP;
    real<lower=0> sigma_R;
}

transformed parameters {
    vector[N] theta = sigma_O*O[oper] + sigma_P*P[part] + sigma_OP*OP[inter];
}

model {
    sigma_O  ~ gamma(0.01, 0.01);
    sigma_P  ~ gamma(0.01, 0.01);
    sigma_OP ~ gamma(0.01, 0.01);
    sigma_R  ~ gamma(0.01, 0.01);

    O  ~ normal(0, 1);
    P  ~ normal(0, 1);
    OP ~ normal(0, 1);

    y ~ normal( theta, sigma_R );
}

When I pass the data to stan, I set y = y - y.mean().

The results I achieve are below:
sO: 0.00035
sP: 1.0639
sOP: 0.00020
sR: 0.04932

I also generated a similar dataset with the same values of sigmas, but with [nO, nP, nOP, N] = [16, 16, 256, 1280].

The results of that are shown below:
sO: 0.61582
sP: 0.20466
sOP: 8.8469\bullet10^{-5}
sR: 0.04155

The current model takes between 25 and 40 minutes to sample (4 chains, 2000 iter, 800 warmup).

How can I get the predictions to be more accurate? And how can I get the model to run faster?

Is there a better way for me to formulate my model?

mike-lawrence · April 24, 2020, 11:19am

Is there any reason that you omitted an intercept from your model? Without it, you are conveying a belief that your sample mean precisely matches the population intercept and that you are infinitely certain of its value.

mike-lawrence · April 24, 2020, 1:40pm

Seeing as N is much bigger than the other count variables, you can probably get speed-ups from computing only the unique combinations then indexing into that to define theta (see example here), but before you get too worried about performance you should pin down why this model is not recovering the data-generating parameters first. The fact that it takes so long suggests to me that something is awry in the first place, as this isn’t a particularly complicated model nor large data. First add an intercept (and a prior on said intercept) and see how that fares. Then if you’re still not getting good recovery, post your data-generating code here too as possibly the model is recovering the data-generating parameters accurately but you’re not generating the data in the way you think you are.

the_stan_man · April 26, 2020, 12:57am

Thank you Mike, I will look into this and see what progress I can make.

Topic		Replies	Views
Request for help understanding model behavior Modeling	17	1509	March 24, 2018
Linear Mixed Model Modeling	2	449	January 21, 2020
Multivariate analysis Modeling rstan , fitting-issues , specification	4	541	April 22, 2021
Mixed effect modeling using brms Modeling	3	143	October 13, 2024
Issues in my stan code Modeling rstan , fitting-issues	8	565	January 26, 2021

Random Effects Model

Related topics