I am trying to fit a multilevel model with one fixed effect variable and a random intercept. The random intercept, however, is a variable containing 142 levels (definitely categorical) whereas the data itself only has 243 datapoints. When I use family = gaussian, the model runs fine and converges normally. After that, I switched to family = student and the chains did not converge with Rhats > 2 and abysmal effective sample sizes. When I remove the random intercept, the student model runs just fine (Rhat = 1).
What I tried:
- Increasingly strong priors
- Various kinds of priors
What surprises me is that even using the default priors, the gaussian model runs well whereas the t-distributed model (regardless of prior strength) can’t seem to handle the random effect. Therefore, I wanted to ask whether a model of the family ‘student’ has to be specified differently with regards to random intercepts or whether such a model can’t be used for datasets with this ratio of datapoints vs. random effect levels. However, if the latter is the case, I’m puzzled as to why the gaussian model didn’t also fail.
I specified my models like this (minimal example with default priors).
The gaussian model (which works fine):
brm(Response ~ 1 + FixedEffect + (1|GroupingVariable), data = mydata, family = gaussian)
The student’s t model (which does not converge):
brm(Response ~ 1 + FixedEffect + (1|GroupingVariable), data = mydata, family = student)
My variables are standardized and between -2 and 2 (with the exception of the response which has a few outliers (> 5). The fixed effect levels group different sizes of observations; some levels contain 10-15 observations while some only contain 1 observation.
Thanks in advance!
A quick update: I did some digging in the meantime and I ran the models using toy datasets and it turns out that the student-t model struggles in cases where there are grouping levels that only correspond to a single datapoint. In my understanding, the only thing different from a gaussian model is the nu parameter which may be the problem here: It might be the case that nu is more difficult to infer if you have a target distribution of one single datapoint. Therefore my question: Is my reasoning right and is there a possibility to set the priors in such a way that this problem can be solved so I can use a student-family model after all?
Does the model estimate
nu per group? I expect problems when trying to sample the degrees of freedom parameter so either fixing it to a single value, giving it a very strong prior, or at least making it shared among all groups is useful. It also tends to not have much of an effect on posterior estimates.
With your model and limited data setting
nu to a few values in the 3-7 range seems like a good idea and you could check if it affects the results.
Hello, Thanks for the response!
I just tested fixing the nu parameter to 3 and that worked well. Nu is being estimated globally so that didn’t seem to be the problem.
I’m now going to try setting a very strong prior to nu in the 3-7 range you suggested to see if that works. If it doesn’t, is fixing nu arbitrarily to a value between 3 and 7 a scientifically valid procedure or does it have to be inferred as well?
Thanks very much for your input!
For many problems something in the range 3-7 for the degrees-of-freedom parameter won’t make much of a difference in terms of inferences or predictions. I would just run the analysis with a few values and check both in-sample group-specific inferences and, if relevant, out-of-sample predictions for held-out values. There are more complex ways of getting at these things but this is a solid start. When you don’t have much data it’s not really possible to infer the degrees of freedom parameter very well anyway (the data is consistent with a wide range and values much above 7 all start to look like the normal density anyway).