Syntax for non-crossed design in brms

I have a fairly boring and probably straightforward question regarding the syntax for a non-crossed design in brms, so let me couch it in terms that make it a little less boring.

Consider an experiment in which a person is given different combinations of coloured balls to throw at statistics faculty at a prestigious university. The balls are stored in a small bag, which is given to the thrower. In different conditions of the experiment, the thrower is first given a bag containing either 4 red balls, 4 red and 2 blue balls, 6 red balls, or 6 red and 2 blue balls. The conditions are limited to this set for practical reasons. The thrower has to carry the bag some distance to the throwing destination (faculty lunch room, on the top floor of the ivory tower), potentially fatiguing their muscles. The balls have different weights, and thus can plausibly affect performance on the basis of (a) which colour ball is being thrown on a specific trial (call this variable “colour”), and (b) the overall composition of the bag they carried before commencing throwing (call this variable “composition”). The researcher running the experiment is interested in the effects of both of these variables, and so wants to obtain estimates of performance for all relevant composition*colour combinations. Naturally, for some levels of the composition factor, no blue balls are included in the bag, so the two variables are not fully-crossed. What would the appropriate syntax be for such a model in brms, to avoid the estimation of effects that are impossible in the design (e.g., the effect of colour when the composition is 4 red balls + 0 blue balls)? Would a model that produces estimates of these impossible effects be biased in its estimates of the genuine effects, given that there are no data in the non-existent design cells? (That is, is it a problem? My intuition is “no”, but I’m wary nonetheless.) How many times could the researcher run this experiment before being escorted off the premises?

My understanding of “nested” variables such as these is that any impossible conditions – which are not informed by any data – will (of course) just return the prior distribution as their posterior; but, importantly, this will have minimal effect on the posteriors that are informed by data. However, when making predictions from such a model (e.g., when plotting conditional effects) these impossible conditions may add unwanted noise (from their priors). For that reason, it may be useful to ensure that the priors of all impossible conditions are set to a constant of zero.

The answer here should be useful with regard to setting up a model formula suitable for nested variables: How do you deal with "nested" variables in a regression model? - Cross Validated

If working in R, note the need to convert NA values in the data to 0s.

If you are working in brms, a workflow to set this up is to use your model formula (designed as per the above) but, first, run it in R’s standard lm(). In the resulting lm model’s summary, note all effects that are not estimated (they have NA values) due to “singularities”. Then, in brms, set the priors of all of those to constant(0).

1 Like

Thank you for the clear and helpful response—setting the priors to constant(0) does nicely.

1 Like