My question is a mixture between a statistical question and a Stan question. Maybe my Stan model doesn’t work, because my stats are all wrong. So first I’ll give some brief context to the model.
I’m doing a meta analysis in a topic of linguistics. The raw datasets should always consist in a response to an experimental manipulation and it is supposed to be clustered by participants (since every participant is exposed to several items) and items (since items are repeated across participants). I want to know the estimate of the effect of the manipulation. Ideally one should analyze the data with a mixed model with by participants and by items random effects. Unfortunately, I have a bunch of papers (from the 80s) and no raw data. In these papers, they show tables with data which is already aggregated by items, and they report the mean and sd by item.
Below I’m showing a real example (I have a lot of tables which are small variations of this) where the data were aggregated over 10 participants, I’m coding the manipulation under condition with -.5, .5:
table <- read.table(header=T,text=
"item Mean Sd condition
1 121 37 -0.5
1 140 37 0.5
2 140 42 -0.5
2 174 32 0.5
3 190 31 -0.5
3 204 35 0.5"
)
I want to estimate the effect of the manipulation, and if I don’t take into account that the data was aggregated I will probably get estimates that are artificially precise. So I thought that I could assume that the Mean has a measurement error: (I always saw measurement errors models when the error was in the covariate and not in the dependent variable, I’m not sure if they have the same name):
error ~ normal(0, Se);
where se = sd/sqrt(10)
, since there are 10 participants
And I include this error in my likelihood, something like this:
(Mean + error) ~ normal(mu, sigma);
where mu = (alpha + adj_item[item]) + x * beta;
that is linear model with a by-item intercept.
But this gives me a lot of divergent transitions ~50 depending on the parametrization and priors. I tried to have everything with a non-centered parametrization and a scale of 1 (in the file recover.stan). But I keep getting divergent transitions.
The thing is that I simulated data, and the estimates make more or less sense. But I’m not sure what to do with the divergent transitions, can I ignore them?
I’m attaching the model and R code to run it with the data I posted before.
Thanks!
Bruno
test_model.R (387 Bytes)
recover.stan (941 Bytes)