How to calculate mean difference from ordinal regression model

Hi there,

I am modelling the effects of age on the number of treatment doses over 3 months. Due to the shape of this distribution, I’m finding that the model is best fit using ordinal regression (see an example of the code below). A problem however, is that I wish to report the results on a continuous scale, like one would with linear regression (e.g. 1 year increase in age is associated with a mean increase of 2.5 doses).

brm(ndose ~ age, family=cumulative(),iter=10000,chains=4,cores=4,data=ndose_met)

I cannot find a way to back transform the results to a continuous scale. The conditional_effects() function from brms (with categorical=FALSE) seems to be giving me what I want, but the output is a plot rather than the mean difference + credible interval.

Does anyone know how to do this (either within R or Stan)? I have managed to calculate the mean difference in Stan if the predictor is categorical (e.g. sex), but not continuous (e.g. age).

The code that I have used to calculate the mean difference for the categorical variable indigenous_status (I’ve taken the brms base code and added in more to the generated quantities section. Apologies for the messy workarounds I’ve done.)

// generated with brms 2.21.0
functions {
  /* cumulative-logit log-PDF for a single response
   * Args:
   *   y: response category
   *   mu: latent mean parameter
   *   disc: discrimination parameter
   *   thres: ordinal thresholds
   * Returns:
   *   a scalar to be added to the log posterior
   */
   real cumulative_logit_lpmf(int y, real mu, real disc, vector thres) {
     int nthres = num_elements(thres);
     if (y == 1) {
       return log_inv_logit(disc * (thres[1] - mu));
     } else if (y == nthres + 1) {
       return log1m_inv_logit(disc * (thres[nthres] - mu));
     } else {
       return log_inv_logit_diff(disc * (thres[y] - mu), disc * (thres[y - 1] - mu));
     }
   }
  /* cumulative-logit log-PDF for a single response and merged thresholds
   * Args:
   *   y: response category
   *   mu: latent mean parameter
   *   disc: discrimination parameter
   *   thres: vector of merged ordinal thresholds
   *   j: start and end index for the applid threshold within 'thres'
   * Returns:
   *   a scalar to be added to the log posterior
   */
   real cumulative_logit_merged_lpmf(int y, real mu, real disc, vector thres, array[] int j) {
     return cumulative_logit_lpmf(y | mu, disc, thres[j[1]:j[2]]);
   }
  /* ordered-logistic log-PDF for a single response and merged thresholds
   * Args:
   *   y: response category
   *   mu: latent mean parameter
   *   thres: vector of merged ordinal thresholds
   *   j: start and end index for the applid threshold within 'thres'
   * Returns:
   *   a scalar to be added to the log posterior
   */
   real ordered_logistic_merged_lpmf(int y, real mu, vector thres, array[] int j) {
     return ordered_logistic_lpmf(y | mu, thres[j[1]:j[2]]);
   }
}
data {
  int<lower=1> N;  // total number of observations
  array[N] int Y;  // response variable
  int<lower=2> nthres;  // number of thresholds
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  int<lower=1> Kc;  // number of population-level effects after centering
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  matrix[N, Kc] Xc;  // centered version of X
  vector[Kc] means_X;  // column means of X before centering
  for (i in 1:K) {
    means_X[i] = mean(X[, i]);
    Xc[, i] = X[, i] - means_X[i];
  }
  int n_indigenous = 0;
  
  for(i in 1:N){
    if(X[i,1] == 1) n_indigenous += 1;
  }
  int n_notindigenous = N - n_indigenous;
}
parameters {
  vector[Kc] b;  // regression coefficients
  ordered[nthres] Intercept;  // temporary thresholds for centered predictors
}
transformed parameters {
  real disc = 1;  // discrimination parameters
  real lprior = 0;  // prior contributions to the log posterior
  lprior += student_t_lpdf(Intercept | 3, 0, 2.5);
}
model {
  // likelihood including constants
  if (!prior_only) {
    target += ordered_logistic_glm_lpmf(Y | Xc, b, Intercept);
  }
  // priors including constants
  target += lprior;
}
generated quantities {
  // compute actual thresholds
  vector[nthres] b_Intercept = Intercept + dot_product(means_X, b);
  
  array[N] int yhat;
  
  real mu_ind = 0;
  real mu_notind = 0;
  
  real n_indigenous2 = n_indigenous;
  real n_notindigenous2 = n_notindigenous;
  for(i in 1:N){
    yhat[i] = ordered_logistic_rng( Xc[i,] * b, Intercept);
    if(X[i,1] == 1) mu_ind += yhat[i]/n_indigenous2;
    else mu_notind += yhat[i]/n_notindigenous2;
  }
  
  real RR = mu_ind/mu_notind;
  real MD = mu_ind - mu_notind;
}


Howdy! Unfortunately, number of treatment doses is not ‘ordinal’ data, unless it has been binned into some sort of ordered categories. Instead, it is count data, or integer. While integers are ordered, the integer space has additional structure, like metric structure, that makes it much more rigid than a space with just ordering. Ordered data has no metric structure, so there is no notion of distance between the ordered elements. Consequently there is no notion of a mean for ordered spaces. This would require a metric structure. (Note that when people try to take a ‘mean’ by summing the counts of each of the ordered elements and dividing by the total counts, they are implicitly assuming a metric structure on the space! There is no ‘mean’ for ordered spaces without a metric.)

Modeling number of treatment doses (counts) would be something you might use a Poisson or negative-binomial model for. These distributions have defined means and are appropriate for count data. So you might try one of those.

1 Like

Hi there jd_c! Thanks for the response I really appreciate that.

From my understanding there were scenarios when the ordinal distribution could be used for continuous outcomes (see Modeling continuous response variables using ordinal regression - PMC ). Poisson / neg-bin / Gaussian distributions were all attempted before I begrudgingly accepted ordinal might be the way to go, unfortunately.

I have heard that one of Frank Harrell’s packages (rmsb?) can calculate the mean difference, but it is in a Frequentist framework.

@harrelfe is on this forum, so he could perhaps chime in and give the answer that you were looking for.

I prefer to model the data generation process, and from your brief description, this is counts, not ordered categories. Poor model fit would speak to a possible poor model, not necessarily that the distribution was incorrectly chosen. There are a number of scenarios that I can imagine that could make the model more complex, for example, if the number of doses is only given in certain multiples or if there are min and max number of doses. These types of things can be built into the model (i.e. the generative process). If it were me, I would focus on building the generative process into a model, rather than trying various distributions in a regression.

1 Like

I’ll try and address your question in a way that would work regardless of whether it is modelled as count data or ordered more generally.

The simulation-based approach to integrating (as you do in the example above) should work just fine. However, I think there are likely a few problems with the code you included. First, since you’re using N random draws, the code is technically a posterior predictive check of the continuous means, that is, an evaluation of sampling variability given the model estimates. What you really want is some large enough number of draws that reduce sampling variability enough to get a good enough estimate.

Second, I don’t think you need to sample at all in this specific case (though you might with a continuous predictor). Let’s say you have 4 different levels of the outcome. Rather than drawing a random value from the ordered distribution, you could instead calculate the probability of drawing each level and weight the contribution accordingly. This would be equivalent to sampling an infinite number of times.

Finally, I suspect that you want to ignore the if-else statement and instead manipulate the first column (that aligns with indigenous/notindigenous). By splitting them out this way, you are replicating all of the differences between those two groups on the remaining predictors. For example, imagine the second predictor (X[,2]) was 1 point higher for the indigenous group than for the non-indigenous. The if-else statement would then pass along that difference to the difference in means.

I believe you actually want to assume that the two groups have the same distribution of the other predictors. Here’s a rough example I put together. There are plenty of ways to make the code cleaner and more efficient, and you’d have to write the ordered_probabilities function, but hopefully it illustrates the idea.

real mu_ind;
real mu_notind;
matrix[n,4] yprobs_ind;
matrix[n,4] yrobs_notind;
vector[4] outcome_values;

for(i in 1:4){
    outcome_values[i] = i;
}

for(n in 1:N){
    row_vector[K] Xc_n = Xc[n,];
    Xc_n[1] = ..indigenous value..; // pre-calculate the centered value for indigenous
    yprobs_ind[n,] = ordered_probabilities(Xc_n * b, Intercept);
    Xc_n[1] = ...non-indigenous value..; // pre-calculate the centered value for non-indigenous
    yprobs_notind[n,] = ordered_probabilities(Xc_n * b, Intercept);
}

mu_ind = (rep_row_vector(1, N) * y_probs_ind / sum(y_probs_ind) * outcome_values;
mu_notind = (rep_row_vector(1, N) * y_probs_notind / sum(y_probs_notind) * outcome_values;

The easiest way to extend this to a continuous predictor would be to compare two specific values, e.g. one standard deviation above and below the mean. You could extend it further by examining the effect of an increase in 1 SD of the predictor at various points of the predictor.

3 Likes

Thankyou for the help Simon! I have definitely gotten closer after using your code (see at the end of the post). I did revert to R and managed to find a pretty simple way to calculate it for categorical covars, making use of the conditional_effects() function in brms. From what I can tell conditional_effects() only outputs a plot, but I can take the data within the plot to estimate the differences:

#brms model
ndose_in <- brm(ndose ~ indigenous, family=cumulative(),data=ndose_met)

#from the brm model of indigenous, get conditional effects on linear scale
cond_effs <- conditional_effects(ndose_in, categorical=FALSE)
#then take the estimates and se of the conditional effects
ind0 <- rnorm(n=10000,mean=30,sd=1.5) #changed mean and sd for anonymity
ind1 <- rnorm(n=10000,mean=40,sd=2.5) #changed mean and sd for anonymity
diff <- ind1-ind0
mean(diff)
quantile(diff,probs=c(0.025,0.975))

These results seem to align pretty closely with the results I’ve obtained from the model with a Gaussian distribution.

This does however rely on something happening in conditional_effects() that I can’t discern from the source code, so if it’s ever changed this method might not work anymore. Weirdly, this method doesn’t work for continuous variables (even when taking into account slope):

#age
#this doesnt appear to work for the credible interval, only the mean
cond_effs_age <- conditional_effects(ndose_age, categorical=FALSE)
#getting the slope, since there arent 1 unit increases in conditional effects
#estimate for max/min age
(ymax-ymin)/(xmax-xmin)
age1 <- rnorm(n=10000,mean=ymin,sd=se_age) #the first age
age0 <- rnorm(n=10000,mean=ymin+slope,sd=se_age) #the first age y +1 for increase
diff_age <- age1-age0
quantile(diff_age,probs=c(0.025,0.975))

I think I’m closer to solving this issue at least. For posterity’s sake I will try and post a solution when I find it.

// generated with brms 2.21.0
functions {
  /* cumulative-logit log-PDF for a single response
   * Args:
   *   y: response category
   *   mu: latent mean parameter
   *   disc: discrimination parameter
   *   thres: ordinal thresholds
   * Returns:
   *   a scalar to be added to the log posterior
   */
   real cumulative_logit_lpmf(int y, real mu, real disc, vector thres) {
     int nthres = num_elements(thres);
     if (y == 1) {
       return log_inv_logit(disc * (thres[1] - mu));
     } else if (y == nthres + 1) {
       return log1m_inv_logit(disc * (thres[nthres] - mu));
     } else {
       return log_inv_logit_diff(disc * (thres[y] - mu), disc * (thres[y - 1] - mu));
     }
   }
  /* cumulative-logit log-PDF for a single response and merged thresholds
   * Args:
   *   y: response category
   *   mu: latent mean parameter
   *   disc: discrimination parameter
   *   thres: vector of merged ordinal thresholds
   *   j: start and end index for the applid threshold within 'thres'
   * Returns:
   *   a scalar to be added to the log posterior
   */
   real cumulative_logit_merged_lpmf(int y, real mu, real disc, vector thres, array[] int j) {
     return cumulative_logit_lpmf(y | mu, disc, thres[j[1]:j[2]]);
   }
  /* ordered-logistic log-PDF for a single response and merged thresholds
   * Args:
   *   y: response category
   *   mu: latent mean parameter
   *   thres: vector of merged ordinal thresholds
   *   j: start and end index for the applid threshold within 'thres'
   * Returns:
   *   a scalar to be added to the log posterior
   */
   real ordered_logistic_merged_lpmf(int y, real mu, vector thres, array[] int j) {
     return ordered_logistic_lpmf(y | mu, thres[j[1]:j[2]]);
   }
   
    vector ordered_probabilities(real eta, vector thres) {
      int ncat = num_elements(thres) + 1; // total ordered categories
      vector[ncat] probs;
      probs[1] = inv_logit(thres[1] - eta);  
      for (k in 2:(ncat - 1)) {
        probs[k] = inv_logit(thres[k] - eta) - inv_logit(thres[k - 1] - eta);
      }
      probs[ncat] = 1 - inv_logit(thres[ncat - 1] - eta);
      return probs;
    }

}
data {
  int<lower=1> N;  // total number of observations
  array[N] int Y;  // response variable
  int<lower=2> nthres;  // number of thresholds
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  int<lower=1> Kc;  // number of population-level effects after centering
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  matrix[N, Kc] Xc;  // centered version of X
  vector[Kc] means_X;  // column means of X before centering
  for (i in 1:K) {
    means_X[i] = mean(X[, i]);
    Xc[, i] = X[, i] - means_X[i];
  }
  
  real mean_ind;
  real mean_notind;
    
  mean_ind = mean(X[, 1]);
  mean_notind = 1-mean(X[, 1]);
}
parameters {
  vector[Kc] b;  // regression coefficients
  ordered[nthres] Intercept;  // temporary thresholds for centered predictors
}
transformed parameters {
  real disc = 1;  // discrimination parameters
  real lprior = 0;  // prior contributions to the log posterior
  lprior += student_t_lpdf(Intercept | 3, 0, 2.5);
}
model {
  // likelihood including constants
  if (!prior_only) {
    target += ordered_logistic_glm_lpmf(Y | Xc, b, Intercept);
  }
  // priors including constants
  target += lprior;
}
generated quantities {
  // compute actual thresholds
  vector[nthres] b_Intercept = Intercept + dot_product(means_X, b);
  
  array[N] int yhat;
  
  real mu_ind;
  real mu_notind;
  matrix[N,84] yprobs_ind;
  matrix[N,84] yprobs_notind;
  vector[84] outcome_values;

for(i in 1:84){
    outcome_values[i] = i;
}

for(n in 1:N){
    row_vector[K] Xc_n = Xc[n,];
    Xc_n[1] = mean_ind; // pre-calculate the centered value for indigenous
    yprobs_ind[n,] = to_row_vector(ordered_probabilities(Xc_n * b, Intercept));
    Xc_n[1] = mean_notind; // pre-calculate the centered value for non-indigenous
    yprobs_notind[n,] = to_row_vector(ordered_probabilities(Xc_n * b, Intercept));
}

mu_ind = (rep_row_vector(1, N) * yprobs_ind / sum(yprobs_ind) * outcome_values);
mu_notind = (rep_row_vector(1, N) * yprobs_notind / sum(yprobs_notind) * outcome_values);

 real sum_yprobind = sum(yprobs_ind);

 real MD = mu_ind - mu_notind;

}


1 Like

as_draws_df() might be helpful here, as it generates a data frame of parameter draws from the model. You can can then manipulate it in R to generate the particular effect you’re interested in. I’m attaching an R script to illustrate this idea. My code gets VERY hairy very quickly, so I apologize if anything is unclear. Hopefully it’s helpful.

ordered_effects.R (8.9 KB)

I generate a sample with a binary indicator for indigenous status, a correlated continuous covariate, and an ordered number of doses. I set the parameters so that indigenous status and the covariate both have large effects but they are correlated such that the observed difference between indigenous and non-indigenous samples is negligible.

I estimate the model, then use those draws to estimate different quantities of interest. For example, we could calculate the mean number of doses for the unmodified/observed sample. Or, we could see how that mean changes if we add 1 to the covariate. etc.. We can compare these predicted means to the OLS estimates as well, both including and excluding the covariate in the model.

  • Observed Sample: mean number of doses using the unmodified/observed sample
  • Covariate + 1: # of doses after adding 1 to the covariate for all cases in the sample
  • Not indigenous: Assume everyone in the same is not indigenous while keeping the covariate at the observed level
  • Indigenous: same thing but assume everyone is indigenous
  • Covariate = 0: Assume everyone has a value of 0 on the covariate while keeping their indigenous status at the observed level
  • Covariate = 1: same thing but set the covariate to 1 for all cases

Finally, we can calculate the conditional effect sizes by comparing these means. For example, the effect of adding 1 to the covariate would be the difference between the adjusted mean and the predicted mean for the observed sample.

2 Likes

This is really great Simon, thank you so much! The additional plots very useful.

A colleague sent me a suggested solution as well, which I will include just in case anyone is curious. (this is assuming you had a dataset called dat, with ndose(1-84), a continuous variable age, and a binary variable sex (0/1)

pacman::p_load(brms, dplyr)
#fit the simulated outcome
fit <- brm(ndose ~ age + sex, family=cumulative(),data=dat)

#using all the posterior samples
sims <- bind_rows(fit[["fit"]]@sim[["samples"]])
intercepts_samps <- as.matrix(sims[grep("b_Intercept", names(sims))])


#extract the posterior samples for the beta coefficient of indigenous status
beta_sex_samps <- sims[["b_sex1"]]
#compute expected probabilities for different indigenous status
E_Y_given_X0 <- rowSums(plogis(intercepts_samps ))
E_Y_given_X1 <- rowSums(plogis(intercepts_samps + beta_sex_samps))
mean_differences <- E_Y_given_X1 - E_Y_given_X0
hist(mean_differences) #distribution of the differences
mean(mean_differences) #estimated differences
quantile(mean_differences,probs=c(0.025,0.975)) #95% credible interval

#extract the posterior samples for the beta coefficient of age
beta_age_samps <- sims[["b_age"]]
#compute expected probabilities for different ages
E_Y_given_X0 <- rowSums(plogis(intercepts_samps))                   
E_Y_given_X1 <- rowSums(plogis(intercepts_samps + beta_age_samps))  
mean_differences <- E_Y_given_X1 - E_Y_given_X0
hist(mean_differences) #distribution of the differences
mean(mean_differences) #estimated differences
quantile(mean_differences,probs=c(0.025,0.975)) #95% credible interval


1 Like