Brms posterior_predict for single factor

I stumbled across this while exploring posterior predict for a model that includes a factor as part of the coefficients
So, my model looks like:

y ~ 1 + x1 + x2 + (1 + x1 + x2 + factor | group) + (1 + x1 + x2 | factor)

Here, the “factor” have 8 levels, and the “group” (which is also a factor variable, obviously) have 11 levels.
The reason for the “factor” being part of the random effects for group is that I suspect that each group have a certain, factor-specific behaviour, and I would like to explore this.
I interpret the sd() and cor() output from summary (where the first level of the factor is missing) as brm picking the first level as “the baseline”, and then models deviations (mu=0) based on the other factor levels.

After fitting and checking the diagnostics, I turn to posterior_predict, in order to make sense of my model (given that the diagnostics look OK, which they do, also when doing posterior predictive checks, i.e. newdata not present)

My first, naive attempt is to create a tibble with

nd <- tibble(x1=0, x2=0, factor="SomeExistingLevel", group="SomeExistingGroup")

and supplying this to

posterior_predict(m, newdata=nd)

But this fails because:

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels

I interpet this as “brms needs to have all levels of factors present in the newdata in order to make posterior predictions”.

So, then I do

nd <- expand_grid(x1=0, x2=0, factor=levels(d$factor), group="SomeExistingGroup")

This succeeds, and give me back a matrix of 8 columns, each with (presumably) the predictions for a particular factor level.

Now, for my question:

  • I gather that brms requires all factors being present in newdata, right? And the reason is that otherwise the contrast calculation fails (if some value is missing, some cholesky matrix is wrongly dimensioned).
  • How shall I interpret the result? One column of predictions for the particular level of the factor?
    Which column order is used in the result? The factor levels, or the levels specified in the newdata (in case they differ - is this even allowed)?
  • Any particular neat way of ensuring that the resulting predictions are named after the factors they represent?

In short, even if I am only interested for the group behaviour in a single factor, my model requires me to calculate predictions for all factor levels, and then throw away the 7 unwanted ones.

Appreciate your inputs in this - I could make a reproducible example in case you want, but I figure this is a “basic brms conceptual question”, so I make do with a theoretical example :-)

Hello @epkanol, I think that something within the nd object was misspecified to generate this error. In these postprocessing functions you should be able to generate predictions for just one row of newdata as long as all of the relevant predictors are present and valid. There should be no requirement to generate predictions that you do not want.

I’m a bit confused by the usage of terms here; I presume that when you say factor you mean a categorical variable that is expressed in R as factor type. In this case the term (1 + x1 + x2 + factor | group) denotes a ‘random slopes’ type model which contains by group variation in the effects of the categorical predictor factor. Such a model would be susceptible to identifiability problems if all values of factor are not represented across all group. A full example might make it easier to dissect the precise meaning of the sd parameters from this model.

Unless you want to do some specific postprocessing on the samples you might find predict() more intuitive to use because it generates a summary array with the same number of rows as newdata. In posterior_predict() you’re being returned one column of samples for each row of newdata (it is essentially transposed).

Using predict() or fitted() the usual code that I use to label the outputs is something like

cbind(newdata, predict(model, newdata))

To do this from columns of samples you would need to transpose them.

Thanks a lot for the reply (as you most likely can tell, I’m quite new to stan and brms modeling - but I’m trying to pick up speed…)

Indeed you are correct that factor in my model above is a categorical variable, and group is a classical grouping factor. I was trying to model team-level behaviour in certain environments, and this was the way I could figure out to use the complete dataset, with partial pooling, to express the fact that some teams (groups) behave differently in different surroundings (factor, in reality these are software components, and the teams are the teams making changes to said components).
Peeking at the data, although there are many zeros (hence, the zero-inflation), there are certainly variations both between teams, components, and how teams behave in a certain component.

In general, if I have two “categorical” variables, and want to express variation along both A, B and “A-in-B”, how would I go about it?

1 + X + (1 + X | G) + (1 + X | H)

would assume that intercept and X beta varied with group G, and also with group H - but would not model how G was affected by group H, right? That is, the model assumes that there are no correlation between G and H?

Or did I get this wrong? I still have a lot to learn about Cholesky and correlation matrices :-)

Thanks for the reply, appreciate all information I can get about this!

Seems I start to get what was the problem here:

  • From what I can tell from the feedback that stan and @AWoodward gives me, the problem arises because my inital model was specified with the factor variable only present in the grouping term.
  • When I also include it in the population-level term, posterior_predict works as expected. I guess that brm will then use the population-level data for that term for the levels that are not present in the newdata.
  • This was a model mis-specification on my part - my full model will now look like:
1 + x1 + x2 + factor + (1 + x1 + x2 + factor | group)

That is, I will remove the factor grouping level, and instead use it as an offset for the population (and group-level) intercepts.
All groups do not involve all factor (the data is skewed, as most realistic data is), but I presume that the population-level offsets will dominate groups lacking certain factor levels.

1 Like