Estimate predictor from multiple measurements and then use to predict outcome

Hi there,

I obtained a predictor 3 times per individual. Then I obtained an outcome once for each individual (N = 120).

I want to estimate the “true” value for the predictor for each individual using partial pooling. Then I want to use this estimate to predict the outcome. So the model looks like this:

Let the predictor be denoted by X and the outcome by y.

X_{obs} \sim normal(\mu_{Xtrue}, \sigma_{X})
\mu_{Xtrue} = \alpha_{subject}
\alpha_{subject} \sim normal(\bar{a}, \sigma_{subject})
\bar{a} \sim normal(0, 1)
\sigma_{X} \sim exponential(1)
\sigma_{subject} \sim exponential(1)

y \sim normal(\mu, \sigma)
\mu \sim \alpha_y + X_{true} \times \beta
\alpha_y \sim normal(0, 1)
\beta \sim normal(0, 0.5)
\sigma \sim exponential(1)

Obviously the vectors for the first model have 360 values, because for each individual there are three values. However, my X_{true} estimate should be the pooled subject mean distributions (1 per subject, thus 120). The outcome vector has 120 values. My question is, how can I model this in stan. If I write the code the way I thought it should be I get an error saying that there cannot be different vector lengths. It only works if I put for each subject the same outcome value 3 times. But then, the model assumes that the outcome has been measured three times, which was not the case.

How do I have to specify this model. Here was the attempt where I put the outcome values 3x per subject but that is not the way I need it. I must code it somehow with vector y as 120 and that for the final model 120 “true” estimated predictor values are used

data{
    vector[360] y;
    vector[360] predictor_obs;
    int id[360];
}
parameters{
    vector[120] a_predictor;
    real a_predictor_bar;
    real<lower=0> sigma_predictor_a;
    real<lower=0> sigma_predictor;
    real a_y;
    real b;
    real<lower=0> sigma;
}
model{
    vector[360] mu;
    vector[360] predictor_true;
    sigma ~ exponential( 1 );
    b ~ normal( 0 , 0.5 );
    a_y ~ normal( 0 , 1 );
    sigma_predictor ~ exponential( 1 );
    sigma_predictor_a ~ exponential( 1 );
    a_predictor_bar ~ normal( 0 , 1 );
    a_predictor ~ normal( a_predictor_bar , sigma_predictor_a );
    for ( i in 1:360 ) {
        predictor_true[i] = a_predictor[id[i]];
    }
    predictor_obs ~ normal( predictor_true , sigma_predictor );
    for ( i in 1:360 ) {
        mu[i] = a_y + b * predictor_true[i];
    }
    y ~ normal( mu , sigma );
}


1 Like

So if I get it correctly, this is some sort of measurement error model?

The way I would do this would be to have (not checked for completeness, just a sketch of the main changes):

data{
    int<lower=1> N_subj;
    int<lower=1> N_X_obs;
    vector[N_subj] y;
    matrix[N_X_obs, N_subj] predictor_obs; 
}

...

model {
  vector[N_subj] mu;
  ...
  for(i in 1:N_X_obs) {
    predictor_obs[i, ] ~ normal(a_predictor, sigma_predictor);
  }

  mu = a_y + b * a_predictor;
  y ~ normal( mu , sigma );
}

I also moved the constants to data which is a good practice but not strictly necessary.

Is that somewhat clear or is some of the syntax hard to understand? (I also hope there are no big errors that would prevent you from compiling this)

1 Like

@martinmodrak:
I could finally test the model with your input and test it (simulations). In the simulations it works very well. Thanks so much for providing me hints. I will test some more scenarios and then apply it to real data. Maybe I will get back with questions but so far it is clear!

1 Like

Hi again,

The model does the job perfectly. But I need to implement the next step:
There is missing data in the predictor and in the outcome. I read the chapter about that in SR2 and the stan manual about missingness but I do not feel confident to code this in stan. At the same time, I do not know how to code above model in rethinking or brms. I am thinking to fit this model in brms and then look at the stancode to build understanding. In any case, I need a hint how to implement this.

@paul.buerkner: Can you maybe tell me if the below model can be written in brms? I think I then would know how to implement the missing data imputation for brms (or alternatively use mice). I saw the measurement error vignette but it seems to assume that I have the SE instead of estimating this SE from the data.

Regarding the missing data:
As this predictor has been obtained three times per individual, the best imputation strategy seems to me to impute it using a multilevel model that only estimates the intercept and random intercepts:

pred \sim normal(\mu_p, \sigma_p)
\mu_p = \alpha_p
\alpha_p \sim normal(\alpha_{pbar}, \sigma_{pbar})
\alpha_{pbar} \sim normal(0,1)
\sigma_{pbar} \sim exponential(1)

So, pred here is known except that for some individuals (<15%) there was only 1 or 2 measurement(s) instead of 3. I used now mean imputation (mean of the individual). But that makes the model maybe too certain. And that also does not solve the missingness in the outcome (what I did not say above to make it less complicated: there are several outcomes that are only weakly correlated < 0.2 but that get predicted from the predictor).

Hope someone can help!

data{
    int<lower=1> N_subj;
    int<lower=1> N_X_obs;
    vector[N_subj] y;
    matrix[N_X_obs, N_subj] pred_obs;
}
parameters{
    vector[N_subj] a_pred;
    real<lower=0> sigma_pred;
    real a_y;
    real bS;
    real<lower=0> sigma;
}
model{
    vector[N_subj] mu;
    sigma ~ exponential( 1 );
    bS ~ normal( 0 , 0.5 );
    a_y ~ normal( 0 , 1 );
    sigma_pred ~ exponential( 1 );
    a_pred ~ normal(0, 1);
    
    for (i in 1:N_X_obs) {
      pred_obs[i, ] ~ normal(a_pred, sigma_pred);
    }
    
    mu = a_y + bS * a_pred;
    y ~ normal(mu, sigma);
}

I am not sure this is actually a missing data problem - if you only have 1 or 2 measurements of pred, those would still inform \alpha_{pbar}, just with less certainty. So you only need to change the data structure to what is called long format that lets you have different number of observations per subject - once again a sketch, not necessarily complete or error free:

data {
  vector[N_X_obs_total] pred_obs;
  int<lower=1, upper = N_subj> pred_obs_subj;
}

model {
    for (i in 1:N_X_obs_total) {
      pred_obs[i] ~ normal(a_pred[pred_obs_subj[i]], sigma_pred);
    }
}

Missingness in the outcome is mostly a completely different problem. Are you familiar with MAR/MCAR/MNAR? (discussed e.g. here: https://stefvanbuuren.name/fimd/sec-MCAR.html).

If you can assume MAR, then missingness in outcome can be handled in a similar way - you just need to use a data structure that works with variable number of observed outcomes per individual, but just leaving out the missing data is OK. The MNAR case is generally hard and more info about your actual setup would be needed.

2 Likes

Thanks @martinmodrak. That works. Indeed that is how I wanted to treat the “missing values” in the beginning. Great to see that stan is so flexible in the way you write the model/put the data in. I hope with time I can see these opportunities myself!

And yes I am familiar with MAR etc. but thanks for the hint!

EDIT Question:
I noticed that I forgot to code the hyperparameter first when I checked the model and it still seemed to work fine. Also from the plot it seems that partial pooling was applied even though in SR2 I learned that you have to specify hyperparameters and priors for so that partial pooling will be applied.

What I mean is that I coded the model below omitting these lines:

...
parameters {
    real pred_bar;
    real<lower=0> sigma_pred_bar;
}

and estimating the predictor like this:

model {
    ....

    for (i in 1:N_X_obs_total) {
      pred_obs[i] ~ normal(a_pred[pred_obs_subj[i]], sigma_pred);
    }
    ...

instead of specifying hyperparameters and priors like this:

model{
    vector[N_subj] mu;
    sigma ~ exponential( 1 );
    bS ~ normal( 0 , 0.5 );
    a_Y ~ normal( 0 , 1 );
    sigma_pred ~ exponential( 1 );
    a_pred ~ normal(pred_bar, sigma_pred_bar);
    pred_bar ~ normal(0, 1);
    sigma_pred_bar ~ exponential(1);
    
    
    for (i in 1:N_X_obs_total) {
      pred_obs[i] ~ normal(a_pred[pred_obs_subj[i]], sigma_pred);
    }
    
    mu = a_Y + bS * a_pred;
    Y ~ normal(mu, sigma);
}

How is it possible that both models seem to apply partial pooling (pulling the individual pred estimates to the overall mean of pred)?

1 Like

To answer my question myself:

There was as expected no partial pooling applied when I did not code the pred_bar. It just looked like it because of the prior for a_pred, that pulled mean estimates to the overall mean. Given that there are only 3 data points, the prior had this (positive) influence. If I make this prior less informative it becomes obvious. Both approaches work fine (either informative prior or partial pooling) in simulations.

2 Likes

Glad you were able to resolve this yourself, best of luck with further adventures in Stan!

2 Likes

Hi @martinmodrak,

I have a new question related to above model. I am currently running simulations for different scenarios.

I get the warning:

Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 4 Exception: normal_lpdf: Scale parameter is 0, but must be > 0! (in ‘/var/folders/7m/550c9p9104v0k6t0yk9y324w0000gn/T/RtmpyE0P48/model-5df420f7dab7.stan’, line 23, column 4 to column 58)
Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

This warning is displayed almost once in every chain but not repeatedly within one chain. Does that mean “often” or not?

Once per chain is usually OK. If you really want to be on the safe side, it is good to try to understand what is actually going on… What would need to happen to make the standard deviation parameter in question 0? Does this happen only when you put some “extreme” data for the model? Does it happen if you use different initial values (e.g. init = 0.5 or init = 0 in rstan is a good first check as it forces the sampler to init in a narrower region)?

1 Like