Nominal data and Kruschke's "conditional logistic" approach

Henrik_Singmann · November 6, 2021, 8:43pm

One approach for doing so (that is admittedly a bit cumbersome) is using the custom family functionality of brms. The idea is to use the tree structure of the conditional logistic model and treat it like a multinomial processing tree model.

Note also that the Fox & Weisberg books (i.e., the basis of the car package) call this type of analysis “nested dichotomies”. Therefore, we can use their example data which uses a categorical response with three categories:

library("tidyverse")
library("brms")

womenlf <- carData::Womenlf
womenlf <- womenlf %>% 
  mutate(work2 = factor(partic, 
                        levels = c("not.work", "parttime", "fulltime"))) %>% 
  mutate(work3 = as.numeric(work2))
str(womenlf)
# 'data.frame':	263 obs. of  6 variables:
#  $ partic  : Factor w/ 3 levels "fulltime","not.work",..: 2 2 2 2 2 2 2 1 2 2 ...
#  $ hincome : int  15 13 45 23 19 7 15 7 15 23 ...
#  $ children: Factor w/ 2 levels "absent","present": 2 2 2 2 2 2 2 2 2 2 ...
#  $ region  : Factor w/ 5 levels "Atlantic","BC",..: 3 3 3 3 3 3 3 3 3 3 ...
#  $ work2   : Factor w/ 3 levels "not.work","parttime",..: 1 1 1 1 1 1 1 3 1 1 ...
#  $ work3   : num  1 1 1 1 1 1 1 3 1 1 ...

The dependent variable here is the level of participation of women in the workplace with three levels. We follow the analysis strategy in Fox and Weisberg (2010, pp. 259) and use the factor ordering "not.work","parttime", "fulltime" (i.e., not.work is the first factor level).

Because we have only three categories, the formula simplifies to:
\phi_1 = mu \\ \phi_2 = (1 - mu) * mub \\ \phi_3 = (1 - mu) * (1 - mub)

Also note that I already include the parameter names we will be using in brms where the first parameter of a custom family is always called mu. We then call the second parameter mub (as we have only three categories, only two independent parameters are needed).

We can then set up the model as follows:

mpt_cond <- custom_family("mpt_cond", 
                          links = "identity", dpars = c("mu", "mub"), 
                          type = "int",
                          vars = c("n_cat"))

stanvars <- stanvar(x = 3, name = "n_cat", scode = "  int n_cat;")

stan_lpmf <- stanvar(block = "functions", scode = "real mpt_cond_lpmf(int y, real mu, real mu_b, int n_cat) {
  real p_mu = inv_logit(mu);
  real p_mub = inv_logit(mu_b);
  vector[n_cat] prob;
  prob[1] = p_mu;
  prob[2] = (1 - p_mu) * p_mub;
  prob[3] = (1 - p_mu) * (1 - p_mub);
  return(categorical_lpmf(y | prob));
}")

The key part is the new lpmf function that contains the formula given above on the probability scale (i.e., after applying the inverse logistic function to transform from linear to probability scale).

We can then specify one formula per parameter and fit the model. We use the same formula as Fox and Weisberg (2010, pp. 265) for this data.

my_formula <- bf(work3 ~ hincome + children + region, 
                 mub ~ hincome + children + region)
fit1 <- brm(my_formula, data = womenlf, family = mpt_cond, 
              stanvars = stanvars + stan_lpmf)

This gives us the following results:

> fit1
 Family: mpt_cond 
  Links: mu = identity; mub = identity 
Formula: work3 ~ hincome + children + region 
         mub ~ hincome + children + region
   Data: womenlf (Number of observations: 263) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept              -1.31      0.57    -2.43    -0.19 1.00     2262     2570
mub_Intercept          -4.12      1.12    -6.43    -2.02 1.00     2380     2610
hincome                 0.05      0.02     0.01     0.09 1.00     4468     2724
childrenpresent         1.65      0.30     1.06     2.27 1.00     3903     2898
regionBC               -0.36      0.59    -1.52     0.80 1.00     2489     2800
regionOntario          -0.20      0.47    -1.11     0.70 1.00     2231     2611
regionPrairie          -0.49      0.57    -1.65     0.63 1.00     2588     3002
regionQuebec            0.17      0.51    -0.84     1.17 1.00     2340     2740
mub_hincome             0.12      0.04     0.03     0.20 1.00     4090     2784
mub_childrenpresent     3.01      0.60     1.90     4.22 1.00     3654     2695
mub_regionBC            1.28      1.09    -0.72     3.44 1.00     2257     2778
mub_regionOntario       0.16      0.91    -1.62     1.96 1.00     1963     2580
mub_regionPrairie       0.43      1.02    -1.52     2.49 1.00     2517     2910
mub_regionQuebec       -0.17      1.01    -2.18     1.76 1.00     2270     2793

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

The results for the first dichotomy (level 1 versus 2 & 3) match the results reported in Fox and Weisberg (2010, pp. 265) and only differ in sign:

The results for the second dichotomy (level 2 versus 3), which are indicated by mub_, are a bit off, but still similar:

Topic		Replies	Views
Target/Outcome Data for Categorical Family w/Logit Link in BRMS brms	3	679	November 28, 2018
Categorical mixture model? brms specification , brms	0	342	July 17, 2023
Using brm to reproduce analysis of Liddell and Kruschke (2018) brms ordinal-response	3	375	February 5, 2024
Use categorical /multinomial family with binary-ish data based on separate column in brms brms multinomial-response , brms	8	1405	August 8, 2023
Marginal_effects for multivariate models with mixed continuous/ordinal responses brms	4	583	May 15, 2019

Nominal data and Kruschke's "conditional logistic" approach

Related topics