I’ve attempted to use brms to model what I would describe as a multivariate between + within subjects ANOVA. The output I get in the summary information does match what I would expect to see given the raw data, but I wondered if you might be able to confirm that I have actually written the code appropriately, as I am quite a noob in terms of both Bayesian modelling and frankly R generally (though very happy to learn as much as possible about both!). As well as checking what I’ve tried, I had a couple of additional questions that I would really appreciate if anyone is able to shed some light on.
Ppts are organised in 2 groups based upon control group vs. anxiety disorder group. Participants are required to give 2 types of assessment in response to a stimulus (multivariate). In addition, participants give these ratings in 2 contexts, within subjects. This is how I coded it:
MANOVAattempt <- brm(formula = mvbind(Assessment1, Assessment2) ~ Group * Context + (1|PPN),
data = MANOVAData,
family = student,
warmup = 1000, iter = 3000, chains = 4, control = list(adapt_delta = 0.975))
I used the mvbind approach described in the brms multivariate vignette. PPN is the participant number, and so I believe that (1|PPN) would be the way to indicate that we are receiving multiple responses from each subject, but I may be mistaken.
Does such a model look to you like it is appropriately analysing the study design I described above?
In addition, if anyone is able to shed some light on the following questions it would be great. I have quite a few so I’m not expecting you to be able to answer them all or in any great detail!:
- I think the mvbind approach models the different ‘outcome’ variables as correlated. Is there a way to run essentially the same multivariate model with them not specified as correlated. I understand the output would then match what you’d get with 2 separate analyses of each variable, but I am thinking 2 MANOVAs could be directly compared. The reason for this is that I do not think the two variables should necessarily correlate, particularly in the anxiety disorder patients, and so I would like to assess the data both ways and know if it makes much of a difference.
- The ‘outcome’ variables are really on a scale, not merely ordinal, with increments all along the scale being equal to one another. However, they are bounded from 0-100. Does this pose a serious issue for the model? I didn’t think this would be a real problem as e.g., I see people predicting a person’s weight or height in similar models, and these variables are also bounded within reason/biological plausability, as are many variables treated as fully continuous).
- I can see the HDI for the estimated difference in each outcome variable for the level of the between and within groups factors, and their interaction, but is there a way to get an HDI for the ‘effect size’ of a variable in the model? This may be a very silly question!
- I can also see the estimated sigma and nu values for the variables. Is there a way to specify that these might be different at different levels of the group/context factors, and receive estimates for those levels?
- I’m a little concerned about the pp_check output I receive for each variable. There is what seems to me a healthy-looking overlap of the y and yrep, but the graph is a little tough to read owing to some yrep values tailing out to +/-1000. Does this look problematic to you (I’ve uploaded an attachment which you can hopefully see). To get the graph I just typed: pp_check(MANOVAattempt, resp = “Assessment1”)
Thank you for taking the time to see if you can help with my questions!
- Operating System: MacOS Sierra 10.12.6
- brms Version: 2.8.0
Solomon, thanks for your in depth comments! Great that the general syntax looks reasonable. I will also take a look at how things differ with omitting the residual correlation.
Regarding effect size, I’m asking if it would be possible once you have the posterior to see an estimated HDI for how much the change in one variable influences the value of the other - not just in units for that variable but in something more standardised (e.g., unique variance explained or something like that?).
I will take look at the vignette you suggest - thanks!
And finally, I may indeed have to try seeing how the model performs when simplified and see if there is a point at which things go awry in the pp_check!
Thanks for your suggestions.
I’m not sure about unique variance explained in this context. But you can re-express the fixed effects as effect sizes by dividing them by sigma. Another option is to fit a series of submodels and use the
bayes_R2() function to compute the Bayesian R^2 for each. You can even get a \Delta R^2 with that approach. If you’re willing to move further from the typical ANOVA workflow, you can also use information criteria (i.e., WAIC, LOO) and model weights to compare the full interaction model with submodels.
Thanks again Solomon. In fact, dividing by sigma is precisely the sort of thing I was looking for, but I wasn’t sure it was okay to do that!
Is there perhaps a paper or vignette that describes this WAIC or LOO approach that I could delve into?
That’s a big topic. You can check out the vignettes by the loo team and also this recent thread on the Stan forums. For more pedagogy, do watch McElreath’s lecture on information theory and information criteria from this winter.
Yes, I can imagine! Thanks for your suggestions, I’ll have the tutorial on when I next do some cooking :)
Jumping in here because facing a similar problem. Could you expand a bit on re-expressing fixed effects?
Say we have a multivariate model. 2 groups, trt A and trt B, which produce 4 responses. I’d want to fit a MANOVA, but at the same time, I want to make inference on the trts A and B.
How do I interpret those fixed effect estimates? If the uncertainity interval of theri difference doesnt’ capture zero, are they different across ALL responses or one of them? If so, which one?
And btw your brms statistical rethinking rendition is amazing!
May have jumped the gun here. The population level effects do specify which response the effect is from.
This is probably a much broader question,but how is that different from fitting 4 separate ANOVAs?
Since you jumped the gun in the first, I’ll focus on the question in your second post. :)
It’s been a really long time since I worked within the ANOVA/MANOVA paradigm, so I’m going to frame this in terms of regression. The big difference between a single multivariate model and a series of univariate models is that you get the residual correlations with the former, but not the later. In my experience, brms fits multivariate models pretty quickly, so from a pragmatic perspective, it might also be faster to fit the multivariate model rather than serially fitting 4 univariate models.
And thanks for the kind words, above.
Thanks Solomon. That’s definitely an advantage. Also I’m not aware of another library in R that let’s you fit multivariate models with random effects etc.
I’m not sure about this, but from some of what I’ve read, I think a further difference might be that when you run the analysis as a MANOVA, brms typically uses the effects seen in each variable to inform the effects seen in the others. So, it could more reliably detect a small effect in one variable if another variable showed a strong effect of the same kind. On the other hand, if you have multiple outcomes variables which all showed no effect, and one variable with a small effect, I believe that small effect might be shrunk down further. The effects in each of the variables partially inform what is expected in the others, as they are assumed to be related to each other in some way. I have to say I am not sure about this though!
Can you show a stripped down example with simulated data?
Hi @Solomon, like I said I’m definitely not sure it does this, but I thought I had read about it somewhere (not super useful, sorry!). I just recall that when I was looking at doing a multivariate model I was going through some other threads and forums which I can no longer find and there was discussion about how to make sure that the effects were not being modelled as correlated across the different outcome variables. People were saying something along the lines of ‘this is possible, but what’s the point as you might as well just run two separate models and don’t have the benefit of running it multivariate, besides only having to write one model’. So my impression was that the default was that brms is implicitly assuming the effects on different variables can inform one another. I suspect if you just ask Paul he could say very quickly whether this is or is not going on!
Okay. Now you would get partial pooling and shrinkage if you used a multilevel model or hierarchical prior approach. But I’m not aware that’s the case with the simple multivariate syntax. If you ever find evidence to the contrary, please share!
If your PPN random effects do not share the same ID (see the brms_multilvel vignette) they are modeled as uncorrelated and do no share any information. The only exchange of information between model parts is the residual correlation in your case which is modeled by default as you use the student family.
Basically, brms does not “magically” share information across model parts. Rather, the structure of the model has to explicitely include this sharing of information. For more on this see the brms_multivariate vignette.