I have a question to the
model_simple in the brms vignette brms_phylogenetics.
Apart from the fixed factor and the intercept, there is an included random factor
(1|phylo) and the corresponding
In the summary of the model, we get the expected
group_level effect ~phylo (
A call to
coef(model_simple) reveals however that there is a parameter
r_phylo[phylo_i, Intercept] for each species in phylo.
So my question is, how many parameters are estimated per random factor. I thought that only one parameter (the
sd) is computed per random factor.
Thanks for the help.
In a Bayesian model, we also treat “random effects” as parameters and as such obtain posterior samples for them. I just don’t show them in the summary output in order not to clutter it too much.
Maybe these slides (1st set, around page 18, I think) by @bgoodri are interesting to you. I assume you come from a frequentist background and wonder why it’s not just one additional parameter…? Generally, in a Bayesian model, the group parameters (intercepts) are part of the model and not part of the error term (as in the frequentist model, where they are marginalized out - in this sense the group level coefficients you get from
lme4 for example are “predicted” and not directly estimated).
Hope this helps!
thanks for your fast reply.
I am still a bit confused since I indeed come from a frequentist background.
Thanks to @Max_Mantei and the slides (slide 15), I understand now that the group parameters are part of the model and not of the error term. Does this means that as soon as you include a “random effect”/ group-level effect, up to
n (#observations) more parameters are estimated?
Yes, this may be exactly what happens, but is totally ok for estimation purpuses. The reasons, we usually don’t estimate them in a frequentist context is because their ML estimates blow up and we need to integrate them out instead. That does not mean integrating out is what one should do, it is merly a solution to not being able to estimate them (when looking from a Bayesian perspective at least).
Ok. But then it is kind of ‘expensive’ to include random effects, isn’t it? I need to have enough data to be able to estimate the additional introduced parameters?
The same holds for frequentist estimation. Don’t get yourself fooled by the number of parameter actually estimated by the algorithm. The complexity of the model stays the same regardless of whether you estimate the random effects or integrate them out. In other words, you do not need more data with the Bayesian model. If at all you need less as you have additional priors helping you out.
Thank you, Paul.
I am still confused about one aspect. Do we calculate the posterior for every observation whenever we include a group-level effect? Or do we estimate a parameter (intercept) for each observation whenever we include a group-level effect.
I think my confusion comes from the fact that I thought that one needs a certain amount (I came across numbers varying between 10 to 30) of observations per fitted parameter in order to estimate the parameter properly.
If a parameter is estimated for each observation whenever a group-level effect is included then there is never enough data…
I have the impression that I just have misunderstood or missing something elementary …
It may be helpful if you read some into material to Bayesian statistics first. This will likely clarify a lot of questions. I recommend reading Richard McElreath’s book “Statistical Rethinking”. There are also multiple papers about brms linked on my website: https://paul-buerkner.github.io/publications/
Last but not least, the Stan users manual contains a lot of useful information.
You mean because informative prior provide already the expected probability distribution of the parameters? Do you have some literature on that?
Does this mean that I can include large number of possible predictors (f.ex. m = n/2 , where n is the sample size) as long as I include an informative prior (f.ex. the horseshoe prior if I know that only a few will be important for the model, but I don’t know which exactly)? Usually I would think that I have not enough data to fit so many parameters…
Thanks. I just started with “Statistical Rethinking”. It was while browsing through the brms material and example models, that I started to question how many parameters the model has or is allowed to have to generate not overfitted models.
While procrastinating here, I think I might have stumbled on a misunderstanding/miscommunication here:
I don’t think that
I would agree that if "
n (#observations)" is replaced with "
n (#groups)". For example, if you have 100 observations that fall into 5 groups, you get 5 additional random effects parameters plus one random effects standard deviation (plus potential covariations …).
up to n
because in case of the given example above, the number of groups are the number of observations, similar to including
(1|obs) in your model.
If you are concerned about the number of parameters, it might be helpful to consider that the number of effective parameters is more important than the number of “counted parameters” (this is certainly not a standard term).
In a hierachical model, the number of effective parameters is lower than the number of “counted parameters”, because the higher level parameters constrain the lower level parameters. (see e.g. here)
So that the number of effective parameters for a random component might be 1 (up to number of levels of that component)?
In my understanding, the idea of “effective number of parameters” is not so much one about counting number of parmaters, but more one about estimating model-complexity. (Sorry, if my phrasing above was misleading).
The general idea is that the number of parameters is an indicator of model complexity, for which one penalizes if one compares models with information criteria like BIC, AIC, DIC, WAIC. The authors of these information criteria estimate model complexity (number of effective parameters) differently, and more spohisticated estimators of model complexity as used in the WAIC imply that model complexity is not necessarily a monotone (linear?) function of the number of random effect parameters.
One way to see this is to consider Equation 13 in the above linked paper and to realize that the variance over mcmc samples in the log predictive density is higher in a model in which “random effects”* are not constrained by an additional hyper-parameter, compared to a model with hyper-parameter (e.g. variance of random effects) for the random effects. That is, the model with more parameters would have lower estimated complexity than the model with fewer parameters.
So, one could say that the model complexity of a hierachical/multilevel model is higher than that of a fixed effects model, but lower than the complexity of a model that estimates independent random effects.
*putting this in parantheses, because I’m not sure one would still call this a random effects model, if one estimated group-specific effects independently.