I am new, both to bayes statistic in general (using brms) and this community. So I am sorry if (a) I did not use the right tags and (b) the answer is too obvious.
I have data from 3 groups (n = 30 each). Each individual completed 24 trials. The 24 trials consisted of 4 scenarios (6 trials each). These 6 trials consisted of 3 types (2 trials each). These 2 different trials differed in the sequence in which stimuli were presented.
I am primarily interested in the group * scenario interaction.
However, I thought that actually there are not only trials nested in participants. Actually, sequences are nested in types which are nested in scenarios which are nested in participants (see above). So I tried:
Note that I did not model scenario as random, as its already included in the fixed effects. In the second model the results are far more „significant“.
My questions are:
Which model should I prefer? On the one side I want to account for the data structure as good as I can. On the other side I am not sure, if it makes sense the way I did it, especially because the results are so much better.
Don’t worry about the answer seeming obvious – your question actually digs into a lot of the nuance of modeling heterogeneity across differing contexts! I will try to write a bit more later but I would suggest checking out Ch 13 of “Data Analysis Using Regression and Multilevel/Hierarchical Models” by Andrew Gelman and Jennifer Hill and/or Michael Betancourt’s factor modeling case study (Impact Factor) in the interim.
I would also caution against preferring a model because it produces “significant” effects of interest. Building a model that fits the data well and then estimating effect sizes and their uncertainty is usually better.
Alright, after some more research I realized that, if sequences are nested in types which are nested in participants and I am interested in the group*scenario interaction I should write:
which makes sense. Now I have to think about my distribution because the dependent variable is bounded count data between 0 and 9 and both, gaussian distribution and (truncated) poisson produce horrible posterior predictive checks. Well, this is another problem…
If you have some thoughts to share I am very interested in hearing them - otherwise already a big thanks for the useful links…
If you have several models fitted to the same data, PSIS-LOO can provide an objective comparison between models. brms provides some handy helper functions: add_criterion and loo_compare. The comparison works across different parameterizations, varying effects structures, and distributions.
I don’t know you dependent variable so I don’t know if it makes sense, but have you thought about a binomial distribution (or possible a beta binomial for allowing overdispersion)? When comparing models with loo, keep in mind that it cannot easily compare models with discrete and continuous response distributions to one another ( Cross-validation FAQ )