Covariance between item means for unbalanced repeated measurements

Hi all,

First off, I’m fairly new to stan, so the answer to my question may be fairly obvious… to others. I’ve been working on this for a few days now, so I figured it was worth a shot to ask around for some help.

I’ve created a model for estimating two different item scores that each contain repeated measurements. This design is unbalanced, such that a given individual has a different number of measurements for the first and second items. I work with psychophysiological data, so unbalanced measurements for items (trial types) is very commonplace after rejecting “bad” measurements with too much noise/artifact.

Right now, this model estimates the within-person standard deviations (sig_err) for each item as well as the between-person standard deviations (sig_id) for each item.

I’ve been trying to build off this model to compute the between-person covariance for the mean item scores (u_1, u_2). I have been pretty unsuccessful. Does anyone have any recommendations about how I can go about computing covariance between u_1 and u_2 in this model?

Any help would be greatly appreciated!


stan_model <- "
data {
  int<lower=1> NOB1; //total observations of first measurement
  int<lower=1> NOB2; //total observations of second measurement
  int<lower=1> NSUB; //total number of subjects

  int<lower=1, upper=NSUB> id1[NOB1]; //grouping id for first meas
  int<lower=1, upper=NSUB> id2[NOB2]; //grouping id for second meas

  vector[NOB1] meas1; //scores for first measurement
  vector[NOB2] meas2; //scores for second measurement

parameters {
  vector[2] mu; //intercepts for meas1 and meas2

  vector<lower=0>[2] sig_id; //bet-person std dev for given meas
  vector<lower=0>[2] sig_err; //within-person std dev for given meas

  vector[NSUB] id_raw1;
  vector[NSUB] id_raw2;

transformed parameters {
  vector[NSUB] u_1; //predicted meas1 score
  vector[NSUB] u_2; //predicted meas2 score

  u_1 = mu[1] + id_raw1sig_id[1];
  u_2 = mu[2] + id_raw2

model {
  id_raw1 ~ normal(0,1);
  id_raw2 ~ normal(0,1);

  mu ~ normal(0,10);

  sig_id ~ cauchy(0,10);
  sig_err ~ cauchy(0,10);

  meas1 ~ normal(u_1[id1],sig_err[1]);
  meas2 ~ normal(u_2[id2],sig_err[2]);

What do you mean by covariance between u_1 and u_2? Do you want the posterior covariance or do you want to impose some kind of covariance prior?

It’s not quite that sig_err is the between-person standard deviations—it’s estimating a scale parameter for a model. The posterior mean for sig_err isn’t guaranteed to be the sample standard deviation, if that’s what you were thinking.

Hi Bob,

Thanks for the response.

I conceptualize u_1 and u_2 as the estimated subject-specific means for each item. Am I not thinking about this correctly? I’m interested in the posterior covariance between u_1 and u_2. I don’t need to impose a covariance prior.

I understand that the posterior mean for sig_err isn’t guaranteed to be the sample standard deviation. And, I plan on adding more terms to the model that will change the meaning of sig_err. I’m just trying to start with as simple of a model of possible.

Thanks again for the help.

There’s no way to directly computer posterior covariance within Stan—the modeling language purely defines random variables. So you need to literally just take the posterior draws and compute the covariance for variables of interest. Also, there’s MCMC uncertainty in that calculation based on the effective sample size.

We tend to do more scatterplotting and visual inspection of the posterior, as covariance is just a linear summary of the true relationship.

Hm, well I guess that’s why I’ve been so unsuccessful when I’ve tried to compute the posterior covariance. Thank you very much for the clarification, Bob. I appreciate it.

Also, I should’ve mentioned that given that

cov[X, Y] = E[X * Y] - E[X] * E[Y]

you can store the the value of X * Y as a generated quantity and compute everything afterwards using the posterior means (the E terms).

Oh, perfect. Thanks for the recommendation. I haven’t played around with the generated quantities section, so this will give me an excuse to learn a new part of Stan. I’ll go ahead and give that a shot.

Thanks again.