Hi,
I have a question to the model_simple
in the brms vignette brms_phylogenetics.
Apart from the fixed factor and the intercept, there is an included random factor (1|phylo)
and the corresponding cov_ranef
matrix.
In the summary of the model, we get the expected group_level effect ~phylo
(sd_phylo__Intercept
)
A call to parnames(model_simple)
or coef(model_simple)
reveals however that there is a parameterr_phylo[phylo_i, Intercept]
for each species in phylo.
So my question is, how many parameters are estimated per random factor. I thought that only one parameter (the sd
) is computed per random factor.
Thanks for the help.
Anna
In a Bayesian model, we also treat ârandom effectsâ as parameters and as such obtain posterior samples for them. I just donât show them in the summary output in order not to clutter it too much.
Hi Anna!
Maybe these slides (1st set, around page 18, I think) by @bgoodri are interesting to you. I assume you come from a frequentist background and wonder why itâs not just one additional parameterâŚ? Generally, in a Bayesian model, the group parameters (intercepts) are part of the model and not part of the error term (as in the frequentist model, where they are marginalized out - in this sense the group level coefficients you get from lme4
for example are âpredictedâ and not directly estimated).
Hope this helps!
Hi Paul,
thanks for your fast reply.
I am still a bit confused since I indeed come from a frequentist background.
Thanks to @Max_Mantei and the slides (slide 15), I understand now that the group parameters are part of the model and not of the error term. Does this means that as soon as you include a ârandom effectâ/ group-level effect, up to n
(#observations) more parameters are estimated?
Yes, this may be exactly what happens, but is totally ok for estimation purpuses. The reasons, we usually donât estimate them in a frequentist context is because their ML estimates blow up and we need to integrate them out instead. That does not mean integrating out is what one should do, it is merly a solution to not being able to estimate them (when looking from a Bayesian perspective at least).
1 Like
Ok. But then it is kind of âexpensiveâ to include random effects, isnât it? I need to have enough data to be able to estimate the additional introduced parameters?
The same holds for frequentist estimation. Donât get yourself fooled by the number of parameter actually estimated by the algorithm. The complexity of the model stays the same regardless of whether you estimate the random effects or integrate them out. In other words, you do not need more data with the Bayesian model. If at all you need less as you have additional priors helping you out.
Thank you, Paul.
I am still confused about one aspect. Do we calculate the posterior for every observation whenever we include a group-level effect? Or do we estimate a parameter (intercept) for each observation whenever we include a group-level effect.
I think my confusion comes from the fact that I thought that one needs a certain amount (I came across numbers varying between 10 to 30) of observations per fitted parameter in order to estimate the parameter properly.
If a parameter is estimated for each observation whenever a group-level effect is included then there is never enough dataâŚ
I have the impression that I just have misunderstood or missing something elementary âŚ
It may be helpful if you read some into material to Bayesian statistics first. This will likely clarify a lot of questions. I recommend reading Richard McElreathâs book âStatistical Rethinkingâ. There are also multiple papers about brms linked on my website: https://paul-buerkner.github.io/publications/
Last but not least, the Stan users manual contains a lot of useful information.
You mean because informative prior provide already the expected probability distribution of the parameters? Do you have some literature on that?
Does this mean that I can include large number of possible predictors (f.ex. m = n/2 , where n is the sample size) as long as I include an informative prior (f.ex. the horseshoe prior if I know that only a few will be important for the model, but I donât know which exactly)? Usually I would think that I have not enough data to fit so many parametersâŚ
Thanks. I just started with âStatistical Rethinkingâ. It was while browsing through the brms material and example models, that I started to question how many parameters the model has or is allowed to have to generate not overfitted models.
While procrastinating here, I think I might have stumbled on a misunderstanding/miscommunication here:
I donât think that
is true.
I would agree that if ân
(#observations)â is replaced with ân
(#groups)â. For example, if you have 100 observations that fall into 5 groups, you get 5 additional random effects parameters plus one random effects standard deviation (plus potential covariations âŚ).
I wrote
up to n
because in case of the given example above, the number of groups are the number of observations, similar to including (1|obs)
in your model.
I see.
If you are concerned about the number of parameters, it might be helpful to consider that the number of effective parameters is more important than the number of âcounted parametersâ (this is certainly not a standard term).
In a hierachical model, the number of effective parameters is lower than the number of âcounted parametersâ, because the higher level parameters constrain the lower level parameters. (see e.g. here)
So that the number of effective parameters for a random component might be 1 (up to number of levels of that component)?
In my understanding, the idea of âeffective number of parametersâ is not so much one about counting number of parmaters, but more one about estimating model-complexity. (Sorry, if my phrasing above was misleading).
The general idea is that the number of parameters is an indicator of model complexity, for which one penalizes if one compares models with information criteria like BIC, AIC, DIC, WAIC. The authors of these information criteria estimate model complexity (number of effective parameters) differently, and more spohisticated estimators of model complexity as used in the WAIC imply that model complexity is not necessarily a monotone (linear?) function of the number of random effect parameters.
One way to see this is to consider Equation 13 in the above linked paper and to realize that the variance over mcmc samples in the log predictive density is higher in a model in which ârandom effectsâ* are not constrained by an additional hyper-parameter, compared to a model with hyper-parameter (e.g. variance of random effects) for the random effects. That is, the model with more parameters would have lower estimated complexity than the model with fewer parameters.
So, one could say that the model complexity of a hierachical/multilevel model is higher than that of a fixed effects model, but lower than the complexity of a model that estimates independent random effects.
*putting this in parantheses, because Iâm not sure one would still call this a random effects model, if one estimated group-specific effects independently.