I stumbled across this while exploring posterior predict for a model that includes a factor as part of the coefficients
So, my model looks like:
y ~ 1 + x1 + x2 + (1 + x1 + x2 + factor | group) + (1 + x1 + x2 | factor)
Here, the “factor” have 8 levels, and the “group” (which is also a factor variable, obviously) have 11 levels.
The reason for the “factor” being part of the random effects for group is that I suspect that each group have a certain, factor-specific behaviour, and I would like to explore this.
I interpret the sd() and cor() output from summary (where the first level of the factor is missing) as brm picking the first level as “the baseline”, and then models deviations (mu=0) based on the other factor levels.
After fitting and checking the diagnostics, I turn to posterior_predict, in order to make sense of my model (given that the diagnostics look OK, which they do, also when doing posterior predictive checks, i.e. newdata not present)
My first, naive attempt is to create a tibble with
nd <- tibble(x1=0, x2=0, factor="SomeExistingLevel", group="SomeExistingGroup")
and supplying this to
posterior_predict(m, newdata=nd)
But this fails because:
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
I interpet this as “brms needs to have all levels of factors present in the newdata in order to make posterior predictions”.
So, then I do
nd <- expand_grid(x1=0, x2=0, factor=levels(d$factor), group="SomeExistingGroup")
This succeeds, and give me back a matrix of 8 columns, each with (presumably) the predictions for a particular factor level.
Now, for my question:
- I gather that brms requires all factors being present in newdata, right? And the reason is that otherwise the contrast calculation fails (if some value is missing, some cholesky matrix is wrongly dimensioned).
- How shall I interpret the result? One column of predictions for the particular level of the factor?
Which column order is used in the result? The factor levels, or the levels specified in the newdata (in case they differ - is this even allowed)? - Any particular neat way of ensuring that the resulting predictions are named after the factors they represent?
In short, even if I am only interested for the group behaviour in a single factor, my model requires me to calculate predictions for all factor levels, and then throw away the 7 unwanted ones.
Appreciate your inputs in this - I could make a reproducible example in case you want, but I figure this is a “basic brms conceptual question”, so I make do with a theoretical example :-)