Reference levels and model interpretation

FeliOtte · February 20, 2025, 3:03pm

Hi all!

I’m currently running a categorical brms model that looks as follows:

mdl ← brm (A ~ B + (1 | participant), dataset, family = categorical(), cores = 4, backend = “cmdstanr”)

with A being the categorical outcome variable with 6 levels (a, b, c, d, e, f),
B being a categorical predictor variable with 2 levels (1, 2),
and participant also being a categorical variable.

The model uses a and 1 as reference levels, so my results look like this:

	Estimate	Est. Error	l-95% CI	u-95% CI	Rhat	Bulk_ESS	Tail_ESS
mub_Intercept	-0.83	0.26	-1.35	-0.33	1.00	1339	1920
muc_Intercept	-0.59	0.29	-1.18	-0.03	1.00	1028	1451
mud_Intercept	0.01	0.40	-0.79	0.81	1.01	741	1376
mue_Intercept	-0.39	0.30	-1.00	0.20	1.00	905	1564
muf_Intercept	-0.96	0.24	-1.45	-0.50	1.01	995	1845
mub_B2	0.82	0.33	0.19	1.48	1.00	1300	1668
muc_B2	-0.99	0.39	-1.78	-0.23	1.00	1164	1609
mud_B2	-1.81	0.54	-2.96	-0.78	1.01	872	1613
mue_B2	-2.12	0.42	-2.98	-1.35	1.00	1378	1630
muf_B2	0.43	0.31	-0.17	1.06	1.01	1140	1529

I have two questions about these results:

Obviously, all of the intercept results are in comparison to a1, but are the slope estimates also in comparison to a1? For example, is mub_B2 calculated in comparison to the theoretical mua_Intercept at zero or in comparison to the mub_Intercept?
Why is there no estimate for mua_B2? I understand that the intercept for a is at zero due to being the reference level, but it would still be possible to get a slope estimate describe the different between a in 1 vs. 2, right?

I’m sorry for how basic these questions probably are, I have seen results of other similar models looking exactly like this, so I don’t think there are any issues, but the lack of mua_B2 has thrown me for a bit of a loop with regards to interpretation of the results. Would appreciate any input!

Operating system: macOS Ventura 13.6.1
R Version: 4.4.2
R Studio Version: 2024.12.0+467
brms Version: 2.22.0
cmdstanr Version: 0.8.1

Solomon · February 20, 2025, 7:14pm

Categorical models like this are strange birds, and their parameters are very challenging to interpret directly. Kruschke covered them in Chapter 22 of his text, and I’ve walked that material out with a brm()-based workflow here. I recommend you back up and first fit an intercepts-only version of the model and take some time working through the meanings of the intercepts. Then scale up to a model with your predictor.

FeliOtte · February 26, 2025, 3:50pm

Thanks for the recommendations! I had previously looked at your brms version of the chapter already, but now actually went to get the book by Kruschke as well and (re-)read the relevant chapters in both. I think I’ve come to an understanding of how to interpret the estimates, I will attempt to describe it below and if anyone feels motivated to read it and maybe confirm, that would be awesome!

The theoretical mua_Intercept is set at zero and the other _Intercept estimates represent the log odds of the other categories in outcome A occurring within level 1 of predictor B.

The theoretical mua_B2 is also set at zero. Since this is the slope estimate, the regression line for category a thus starts at zero and is completely horizontal, thus ending at zero for level 2 of predictor B as well. The other slope estimates are calculated in comparison to this zero slope. In visual terms: to calculate mub_B2, the zero slope of mua is moved to the mub_Intercept at -0.83, creating a horizontal line at that intercept. The slope mub_B2 is then described in terms of its divergence from that horizontal line, resulting in a regression line from mub_Intercept to the value of b at level 2 (about -0.01 in this case). The estimate thus describes in log odds how much more likely b is to occur in level 2 rather than 1. It does not describe (as I previously worried about) the log odds of b occurring in level 2 compared to a occurring in level 1. Does that sound about right?

I’m still not sure why the model would be calculated this way to be honest. Surely one could use a horizontal line as a reference point that is unrelated to any of the outcome categories? That way a meaningful value could be calculated for mua_B2. Regardless, I am more confident in my interpretation of the results now, so thank you!

Topic		Replies	Views
Estimate of all levels of categorical variable from brm summary brms	6	3698	March 1, 2019
How to interpret estimates in a 2x2 model with family = categorical brms interpret-results , brms	5	405	February 19, 2024
Multinomial logistic regression with categorical family (brms) Modeling multinomial-response , brms	5	141	May 2, 2025
Linear no-intercept model in brms interpretation Modeling techniques , specification , hierarchical-model , brms	4	137	April 27, 2025
Interpret parameters from multinomial (categorical) logit regression Modeling rstan , brms	1	909	February 16, 2022

Reference levels and model interpretation

Related topics