How can I make my brms partial proportional odds model more "flexible?"

I am dealing with a bit of a contrived example of ordinal response data, that was purposefully engineered to grossly violate the proportional odds assumption. In the example dataset, we have three varieties of plant that have different rating categories, C (worst), B, and A (best). The number of plants in each variety rated in each of the three categories is given for 12 different growers. As you can see from the raw data plot, Variety 1 has roughly equal observed proportion of ratings in the three categories. Variety 2 has mostly B rating, and Variety 3 has mostly A and C rating with few B. This means that the proportional odds assumption is violated because the relative probabilities of the ratings are very different between the varieties. But Variety 3 is what makes the dataset pathological. Contrary to what you would expect from the ordinal rating scale, it has high probability of being in the lowest or highest rating category, but low probability of being in the middle. (Let’s imagine it is one of those polarizing flavors like cilantro, where “you love it or you hate it” but few people are in the middle!)

Observed data plot

image

Conditional effects plot for proportional odds model

image

Fitting a proportional odds model to these data, with fixed effect of variety (and a random intercept and slope term for grower) produces very bad predictions, as we would expect.

Conditional effects plot for partial proportional odds model

image

However, relaxing the proportional odds assumption by wrapping the fixed variety term in cs(), to fit a so-called partial proportional odds model, doesn’t fully fix the problem. It looks like there is something in the model structure that is constraining the relative ordering of the conditional probabilities to be the same across the three varieties. The point estimate of p(rating=B) is greater than the point estimate of p(rating=C) and p(rating=A) for all three varieties. What I want is a model that will allow the estimate of the conditional probability of each rating to vary by variety so that p(rating=B) can be greatest in some varieties and lowest in others. Is this possible with this class of model in brms? Or is it necessary to create a custom Stan model?

Code to reproduce example data, fit models, and make plots

library(brms)
library(dplyr)
library(ggplot2)

exdata <- structure(list(Grower = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                              1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 
                                              3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 
                                              5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 
                                              7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 
                                              9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 
                                              11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
                                              12L), levels = c("1", "2", "3", "4", "5", "6", "7", "8", "9", 
                                                               "10", "11", "12"), class = "factor"), 
                         Variety = structure(c(1L, 
                                               1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 
                                               1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 
                                               3L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 
                                               1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 
                                               1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 
                                               1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 
                                               3L), levels = c("1", "2", "3"), class = "factor"), 
                         Rating = structure(c(3L, 
                                              2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 3L, 2L, 1L, 
                                              3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 2L, 1L, 3L, 2L, 
                                              1L, 2L, 1L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 3L, 2L, 1L, 
                                              3L, 2L, 1L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 2L, 1L, 3L, 2L, 1L, 
                                              3L, 2L, 1L, 3L, 2L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 3L, 2L, 1L, 
                                              3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 2L, 1L, 3L, 2L, 
                                              1L), levels = c("C", "B", "A"), class = c("ordered", "factor"
                                              )), 
                         N_Plants = c(20L, 16L, 5L, 1L, 36L, 4L, 4L, 2L, 37L, 21L, 
                                      15L, 13L, 8L, 36L, 30L, 2L, 11L, 26L, 14L, 3L, 6L, 35L, 1L, 12L, 
                                      3L, 30L, 2L, 22L, 25L, 28L, 16L, 8L, 2L, 37L, 8L, 41L, 17L, 29L, 
                                      2L, 1L, 46L, 6L, 14L, 22L, 24L, 24L, 30L, 1L, 11L, 36L, 9L, 2L, 
                                      43L, 4L, 32L, 5L, 6L, 13L, 18L, 11L, 45L, 1L, 39L, 2L, 4L, 23L, 
                                      21L, 3L, 9L, 32L, 36L, 1L, 5L, 10L, 18L, 17L, 15L, 29L, 29L, 
                                      4L, 15L, 32L, 10L, 5L, 2L, 42L, 1L, 30L, 4L, 12L, 3L, 9L, 29L, 
                                      28L, 12L, 10L, 3L, 30L)), class = "data.frame", row.names = c(NA, 
                                                                                                    -98L))

# Plot raw data (probability of each rating at each grower)
observed_p <- exdata %>%
  group_by(Grower, Variety) %>%
  mutate(p = N_Plants/sum(N_Plants))

ggplot(observed_p, aes(x = Variety, group = interaction(Variety, Rating), color = Rating, y = p)) +
  geom_point(position = position_dodge(width = .5)) +
  theme_bw() +
  scale_color_discrete()

# Proportional odds model
fit_po <- brm(
  Rating ~ Variety + (1 + Variety | Grower),
  family = cumulative(link = 'logit', threshold = 'flexible'),
  data = exdata,
  seed = 130
)

# Model relaxing proportional odds assumption
fit_ppo <- brm(
  Rating ~ cs(Variety) + (1 + Variety | Grower),
  family = cumulative(link = 'logit', threshold = 'flexible'),
  data = exdata,
  seed = 130
)

conditional_effects(fit_po, categorical = TRUE) 
conditional_effects(fit_ppo, categorical = TRUE)

It’s also worth comparing with the R rmsb package blrm function which implements constrained and unconstrained partial proportional odds models. See RMSb . At present the predict and Predict functions do not work for unconstrained PPO models, and the constraint in PPO models allows only for one dimension of constraints. That will be generalized before too long.

1 Like

Thanks for the comment. So to make sure I am using the terminology correctly, the model in brms is a constrained partial proportional odds model, and I am looking for an unconstrained partial proportional odds model? Is that correct?

I’m not sure. I was more raising the issue of try to match the model in rmsb as a check on what you’re seeing. You can also check against MLEs using the R ordinal and VGAM packages.

In response to @harrelfe : Here is a comparison with a maximum likelihood estimate from the ordinal package. A partial proportional odds estimate is implemented in the clmm2() function. It doesn’t have the ability to fit random slopes or multiple levels of random intercept but otherwise is close to what I am trying to fit with brms. As you can see in the plot, the maximum likelihood estimates (triangles) are much better than the brms estimates (circles). Code is below.

image

code to produce plot

(follows example code in original post)

library(ordinal)

fit_ppo_clmm2 <- clmm2(Rating ~ Variety,
                       nominal = ~ Variety,
                       random = Grower,
                       weights = N_Plants,
                       link = 'logistic',
                       threshold = 'flexible',
                       data = exdata,
                       Hess = TRUE)

pred_grid <- with(exdata, expand.grid(Variety = unique(Variety), Rating = unique(Rating)))

# Predict means for each variety x rating
pred_clmm2 <- predict(fit_ppo_clmm2, newdata = pred_grid)

pred_brm <- predict(fit_ppo, newdata = data.frame(Variety = 1:3), re_formula = ~ 0)

pred_values <- cbind(pred_grid, clmm2 = pred_clmm2, brm = c(pred_brm[, 3:1])) %>%
  tidyr::pivot_longer(c(clmm2, brm), names_to = 'model', values_to = 'pred')

ggplot(observed_p, aes(x = Rating, color = Rating)) +
  geom_point(aes(y = p), alpha = 0.5) +
  geom_point(aes(y = pred, shape = model, group = model), data = pred_values, position = position_dodge(width = 0.5), size = 3) +
  facet_wrap(~ Variety) +
  theme_bw() + see::scale_color_okabeito()
1 Like

Here is yet another update. I have created a Stan model that allows completely flexible category-specific treatment effects, and has random intercepts and random slopes. This gives me the desired result. In the figure below, the circles with error bars are means and 95% confidence intervals from a reference frequentist implementation using SAS PROC NLMIXED, and the triangles with error bars are posterior medians and 95% equal tailed credible intervals from the Stan fit. I would still be interested in knowing whether this model is possible to implement in brms.

image

Stan code

data {
  int<lower=1> N;
  int<lower=1> N_growers;
  int<lower=1> N_varieties;
  array[N, 3] int<lower=0, upper=1> Rating; // 1=C 2=B 3=A. Must be 2d array because of how multinomial distribution is defined.
  array[N] int<lower=1> Variety;
  array[N] int<lower=1> Grower;
}

parameters {
  // Intercepts (associated with the reference level, rating C)
  real int_0;
  real int_1;
  
  // Variety effects for ratings A and B
  real tau_01;
  real tau_11;
  real tau_02;
  real tau_12;
  
  // Standard deviations of random effects
  real<lower=0> sd_g; // Called sdb in the original SAS code. (block = g for grower and treatment = v for variety)
  real<lower=0> sd_gv; // Called sdbt in the original SAS code.
  
  // Individual random effects
  vector[N_growers] g; // Called blk_j in the original SAS code. This is the random effect for each grower.
  array[N_varieties] vector[N_growers] gv; // Called  bt_1j, bt_2j, and bt_3j in the original SAS code. This is the random slope with respect to variety for each grower.
}

transformed parameters {
  // Linear predictors (there are two link functions because k=3 levels of Rating)
  vector[N] eta_0;
  vector[N] eta_1;
  // Parameters of the multinomial distribution (probabilities of each rating)
  vector[N] Pi_0;
  vector[N] Pi_1;
  vector[N] Pi_2;

  for (i in 1:N) {
    // Two linear predictors because there are three levels of rating.
    eta_0[i] = int_0 + tau_01*(Variety[i]==1) + tau_02*(Variety[i]==2) + g[Grower[i]] + gv[Variety[i]][Grower[i]];
    eta_1[i] = int_1 + tau_11*(Variety[i]==1) + tau_12*(Variety[i]==2) + g[Grower[i]] + gv[Variety[i]][Grower[i]];
    
    // Apply inverse link function to linear predictors. Ensure that Pi1 is not negative.
    Pi_0[i] = 1/(1+exp(-eta_0[i]));
    Pi_1[i] = max([0.0, 1/(1+exp(-eta_1[i])) - 1/(1+exp(-eta_0[i]))]);
    Pi_2[i] = 1-Pi_0[i]-Pi_1[i];
  }
}

model {
  // Priors. Weakly informative to enable model convergence (just constrain the values to a reasonable order of magnitude)
  sd_g ~ gamma(1, 1);
  sd_gv ~ gamma(1, 1);
  int_0 ~ normal(0, 10);
  int_1 ~ normal(0, 10);
  tau_01 ~ normal(0, 10);
  tau_11 ~ normal(0, 10);
  tau_02 ~ normal(0, 10);
  tau_12 ~ normal(0, 10);
  
  // Likelihood
  for (i in 1:N) {
    Rating[i] ~ multinomial([Pi_0[i], Pi_1[i], Pi_2[i]]');
  }
  g ~ normal(0, sd_g); // Because covariance of random intercept and random slopes is 0, we can do them as separate normals instead of a multivariate normal
  for (i in 1:N_varieties) {
    gv[i] ~ normal(0, sd_gv); 
  }
}

generated quantities {
  // For each posterior sample, generate the estimated marginal probability of each rating for each variety
  // This is not conditioned on random effects.
  real pAv1 = 1/(1+exp(-(int_0+tau_01)));
  real pBv1 = 1/(1+exp(-(int_1+tau_11)))-1/(1+exp(-(int_0+tau_01)));
  real pCv1 = 1-(1/(1+exp(-(int_1+tau_11))));
  real pAv2 = 1/(1+exp(-(int_0+tau_02))); 
  real pBv2 = 1/(1+exp(-(int_1+tau_12)))-1/(1+exp(-(int_0+tau_02)));
  real pCv2 = 1-(1/(1+exp(-(int_1+tau_12)))); 
  real pAv3 = 1/(1+exp(-int_0)); 
  real pBv3 = 1/(1+exp(-int_1))-1/(1+exp(-int_0));
  real pCv3 = 1-(1/(1+exp(-int_1)));
}

Apologies for wasting everyone’s time. I realized I made a mistake in the brms model, the model code is correct but I forgot to expand the data out to long form, one row per trial, before fitting the model. Once I fit the model to the correct data, it all worked!

1 Like