Hi all,
I am working on what may be a simple problem, but I can’t quite wrap my head around it. I have spent time searching for a solution to a problem like this, but I must not be using the correct search terms.
I am trying to construct a simple linear model in brms with one response, one predictor, and a group level factor (3 islands in this case). The hard part is that one of the 3 “islands” is composed of 3 smaller islands. Here is some fake data:
fyi in the real data each “sub-island” has multiple data points.
The important thing to note is that I have the full values of island “C” without the sub-islands, but I would also like to take information about the sub-islands into account.
So starting with size ~ 1 + age + (age | island)
and moving towards something more complex.
I’ve thought about using multiple membership such as:
size ~ 1 + age + (age | mm( island, sub-island))
However, it may be possible (or better) to combine two models, like:
bf(size ~ 1 + age + (age | island), size ~ 1 + age + (age | sub-island))
but I am not sure how to structure this type of model to make sure the hierarchical group-level effect is used properly
Another option would be to set a single “sub-island” for islands A & B (i.e A1, B1) and run
size ~ 1 + age + (age | island) + (age | sub-island)
but I feel I may lose information or over-parameterize the model this way.
I’ve been stuck on this, and any suggestions on how to deal with this data structure would be very helpful to me.
Thank you for assisting a novice!
Please also provide the following information in addition to your question:
- Operating System: osx Mojave
- brms Version: 2.9.0
Perhaps this was the wrong forum for the question. Moving to r-sig-mixed-models
Thanks.
It can take a little time to get responses here because the users are a subset of all people doing statistics and brms users are a subset of of Stan users and brms users who are doing multi-level models are a subset of them.
If you haven’t gotten any help yet I"ll toss in a basic feeling I have about this. I’m assuming you have read some of the r-sig-mixed models and read the GLMM FAQ. Ben Bolker also has a set of worked examples from his book chapter in ecological statistics.
I also suggest looking up the things Doug Bates has written about mixed models (the math focused bits might head you towards a correct choice.)
I’ve probably spent hundreds of hours over several years reading and researching and testing methods to get a model that correctly reflects a data set I have. So as the glmm faq says mixed models are HARD. Harder than you think.
So without knowing more about your model and it’s system I’ll say I would START by treating each sub island individually along with the A,B islands. I would say it isn’t appropriate to look at C as a whole and as individual sub islands in the same model (in the spirit of pseudo replication).
Depending on what is sampled and how it is sampled you could treat a single sample site or sample event as Observation Level Random Effects (OLRE). This is most helpful when data are over dispersed. I’m not sure if that over dispersion represents the huge problem in model estimation in a bayesian framework that it can in a frequentist (I have largely pursued Bayesian methods because of my inability to have a model fit my data in a way that satisfies me (co-authors just want me to finish because the results all seem to be the same qualitatively but darn me and my desire to do things the right way).)
Anyway, if you aren’t getting helpful responses, a valuable approach is to simulate some data that you know the parameters and data generating mechanisms for and see if you can specify a model that approaches it (lots of resources online for how to do this).
And second, build a model UP in a thoughtful way, rather than try to create it new from whole cloth.
1 Like
Thank you Meg. I have gotten a bit more advice. It looks like the data structure I am dealing with is called “partially nested”. I have found a few resources online that propose different ways of dealing with this issue. Of course there seems to be no single “correct” answer.
I’ve probably spent hundreds of hours over several years reading and researching and testing methods to get a model that correctly reflects a data set I have. So as the glmm faq says mixed models are HARD. Harder than you think.
Yes! This is very true. I’ve been teaching myself these methods for the past two years and there is still so much more to learn. I’ve barely begun…which makes the work of all the developers that much more amazing.