Measurement error? model

My question is a mixture between a statistical question and a Stan question. Maybe my Stan model doesn’t work, because my stats are all wrong. So first I’ll give some brief context to the model.

I’m doing a meta analysis in a topic of linguistics. The raw datasets should always consist in a response to an experimental manipulation and it is supposed to be clustered by participants (since every participant is exposed to several items) and items (since items are repeated across participants). I want to know the estimate of the effect of the manipulation. Ideally one should analyze the data with a mixed model with by participants and by items random effects. Unfortunately, I have a bunch of papers (from the 80s) and no raw data. In these papers, they show tables with data which is already aggregated by items, and they report the mean and sd by item.

Below I’m showing a real example (I have a lot of tables which are small variations of this) where the data were aggregated over 10 participants, I’m coding the manipulation under condition with -.5, .5:

table <- read.table(header=T,text=
"item Mean Sd condition
  1   121  37  -0.5
  1   140  37   0.5
  2   140  42  -0.5
  2   174  32   0.5
  3   190  31  -0.5
  3   204  35   0.5"
)

I want to estimate the effect of the manipulation, and if I don’t take into account that the data was aggregated I will probably get estimates that are artificially precise. So I thought that I could assume that the Mean has a measurement error: (I always saw measurement errors models when the error was in the covariate and not in the dependent variable, I’m not sure if they have the same name):

error ~ normal(0, Se);

where se = sd/sqrt(10), since there are 10 participants

And I include this error in my likelihood, something like this:

(Mean + error) ~ normal(mu, sigma);

where mu = (alpha + adj_item[item]) + x * beta; that is linear model with a by-item intercept.

But this gives me a lot of divergent transitions ~50 depending on the parametrization and priors. I tried to have everything with a non-centered parametrization and a scale of 1 (in the file recover.stan). But I keep getting divergent transitions.

The thing is that I simulated data, and the estimates make more or less sense. But I’m not sure what to do with the divergent transitions, can I ignore them?

I’m attaching the model and R code to run it with the data I posted before.

Thanks!
Bruno

test_model.R (387 Bytes)

recover.stan (941 Bytes)

1 Like

You can have measurement errors in both the covariates and the outcome of interest. Usually the latter is just rolled into the overall unexplained variance or “noise” term in a regression.

You probably want to take another look at Andrew’s favorite example, 8 schools, as it’s exactly this kind of model, right down to measurements being aggregated and reported as summaries of mean and standard deviation.

What you want to do is continue thinking generatively in terms of a likelihood. There’s a true but unknown value from which your summarized measurements are derived.

I think you want to use standard deviation, not divide by sqrt(10).

And sampling mean plus error can’t be right, as it’s going to bias things when you always add the error.

I think I never answered because I couldn’t install rstan on my tiny laptop. Your current model might work better from a sampling perspective if you non-center the rest of the way. More or less like this:

mu = alpha + adj_item[item] + x * beta + error_raw .* Se;

McElreath’s book has a good example of how to reflect measurement error in a dependent variable. See Section 14.1.1 in that book. Know that McElreath’s R package (“rethinking”) is a front-end for Stan and, so, the syntax as shown in his example isn’t precisely how you’d write this in Stan. But you can easily see from the example how to do so. Or you can install McElreath’s package (if you’re using R), run his example, and then run stan_code() on the object to reveal exactly how the Stan code looks in this situation.

Thanks for the suggestion, I looked back at the 8 school, and it is kind of a similar situation. And there it’s also SE, which I think it makes sense because somehow I need to inform the model about the precision of the Means from the table that I have, with more observations, the SE is smaller which makes sense.

But I get a very similar posterior for beta, and ~40 divergent transitions.recover_bc.stan (871 Bytes)
So I’m not sure what’s happening, should I parametrize this model somehow differently?

If I think about this generatively, I would say I have 10 missing observations generated from each value in the table (~normal(Mean[n], Sd[n])) (now, I agree that it has to be SD). And then each missing observation is sampled from ~ normal(mu, sigma). This would be the model recover_generative.stan (1009 Bytes)

Now the posterior of beta is much tighter than in the previous two models, but it also makes sense. And I also have around 40 divergent transitions. It may make sense to integrate out the missing values (something like 12.3 Censored Values of Stan 16.0 manual), but I couldn’t figure out how. Does this model make sense? Is it possible to integrate the missing values?

I tried this, I still get divergent transitions.

Thanks, yes, it’s exactly the model that Bob suggested. I tried it in recover_bc.stan above, but I still get divergent transitions.

Plotting is the answer but I can’t run Stan right now :)

I just wanted to update that I used non-centered param everywhere now, and increased the number of iterations to 4000 and I got rid of the divergent transitions. So the recover_bc.stan is working!! :)

I would still want to hear your (from you all) thoughts about the recover_generative.stan model. Does it make any sense?

1 Like

I didn’t see that attached. Is that the recover.stan file you attached earlier?

Here it goes again:
recover_generative.stan (1009 Bytes)

But as an update to the update, I’m making a lot of the measurement models (as suggested by Bob -recover_bc.stan-) for all sort of small datasets with different hierarchical structures, and in some of them I always get around 50 divergent transitions unless my init=0. I’ll post later the exact model(s), once I manage to write the minimal model that still is problematic…

I take it those are divergent after warmup? I’m assuming you already have the target acceptance rate near 1.

Sorry I was on vacations, the answers are yes and yes (.99999)

Then you’ll need to either reparameterize or provide tighter priors. Or find a model that fits the data better. Do you have the same problems with simulated data of roughly the same size?

ok, I solved the problem with recover_bc.stan. It was having priors that were a half normal for params.

real<lower=0> alpha_raw;
vector<lower = 0>[N] theta;

Removing “lower=0” fixes all the problems. These paam shouldn’t be negative, but I guess it was bad idea to have these priors, since anyway they are not estimated as negative.

I guess this is the folk theorem striking again :/

Are some of the draws negative?

Hard bounds like <lower = 0> can cause problems if values very close to zero are consistent with the data. They involve a log transform to the unconstrained scale. So values near zero approach negative infinity on the unconstrained scale and can easily lead to overflow with floating point.