Understanding brms model code: Ordered Probit with Ordered Predictor (monotonic effects)

Hi all,

I need some help interpreting some of the model code from a brms model. I have an ordered outcome y with 12 categories (11 thresholds) and an ordered predictor x with 4 ordinal categories. At the most basic, the model can be coded as follows:

dat <- list(x = x, y = y)

mod <- brm(y ~ (mo(x), data = dat, family = cumulative("probit"))

The model from brms mod$model is output as follows:

functions {
  /* cumulative-probit log-PDF for a single response
   * Args:
   *   y: response category
   *   mu: latent mean parameter
   *   disc: discrimination parameter
   *   thres: ordinal thresholds
   * Returns:
   *   a scalar to be added to the log posterior
   */
   real cumulative_probit_lpmf(int y, real mu, real disc, vector thres) {
     int nthres = num_elements(thres);
     real p;
     if (y == 1) {
       p = Phi(disc * (thres[1] - mu));
     } else if (y == nthres + 1) {
       p = 1 - Phi(disc * (thres[nthres] - mu));
     } else {
       p = Phi(disc * (thres[y] - mu)) -
           Phi(disc * (thres[y - 1] - mu));
     }
     return log(p);
   }
  /* cumulative-probit log-PDF for a single response and merged thresholds
   * Args:
   *   y: response category
   *   mu: latent mean parameter
   *   disc: discrimination parameter
   *   thres: vector of merged ordinal thresholds
   *   j: start and end index for the applid threshold within 'thres'
   * Returns:
   *   a scalar to be added to the log posterior
   */
   real cumulative_probit_merged_lpmf(int y, real mu, real disc, vector thres, int[] j) {
     return cumulative_probit_lpmf(y | mu, disc, thres[j[1]:j[2]]);
   }
  /* compute monotonic effects
   * Args:
   *   scale: a simplex parameter
   *   i: index to sum over the simplex
   * Returns:
   *   a scalar between 0 and rows(scale)
   */
  real mo(vector scale, int i) {
    if (i == 0) {
      return 0;
    } else {
      return rows(scale) * sum(scale[1:i]);
    }
  }
}
data {
  int<lower=1> N;  // total number of observations
  int Y[N];  // response variable
  int<lower=2> nthres;  // number of thresholds
  int<lower=1> Ksp;  // number of special effects terms
  int<lower=1> Imo;  // number of monotonic variables
  int<lower=1> Jmo[Imo];  // length of simplexes
  int Xmo_1[N];  // monotonic variable
  vector[Jmo[1]] con_simo_1;  // prior concentration of monotonic simplex
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
}
parameters {
  ordered[nthres] Intercept;  // temporary thresholds for centered predictors
  simplex[Jmo[1]] simo_1;  // monotonic simplex
  vector[Ksp] bsp;  // special effects coefficients
}
transformed parameters {
  real disc = 1;  // discrimination parameters
  real lprior = 0;  // prior contributions to the log posterior
  lprior += student_t_lpdf(Intercept | 3, 0, 2.5);
  lprior += dirichlet_lpdf(simo_1 | con_simo_1);
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = rep_vector(0.0, N);
    for (n in 1:N) {
      // add more terms to the linear predictor
      mu[n] += (bsp[1]) * mo(simo_1, Xmo_1[n]);
    }
    for (n in 1:N) {
      target += cumulative_probit_lpmf(Y[n] | mu[n], disc, Intercept);
    }
  }
  // priors including constants
  target += lprior;
}
generated quantities {
  // compute actual thresholds
  vector[nthres] b_Intercept = Intercept;
}

I am interested in the model code because I am using this in a more bespoke modeling scenario in cmdstanr. However, I am struggling trying to understand the code. I have questions about the following:

  • In the data block, I am confused by variables that were not initially ā€œinputā€ by myself. I know Y, N, nthresh, and Xmo all come from my data declarations. What about ā€˜kspā€™, Imo, Jmo, and con_simo_1? I think Jmo is the number of ordered categories of X (4), but I am unsureā€¦ Is Ksp then 1 in my case because I only have 1 predictor? But, Imo is also 1?

Admittedly this is my first attempt at working with simplexes outside of the brms framework, so I want to make sure I am understanding what is going on in the stan code.

Thank you all!

1 Like

You can use the function standata(mod) at it will print the data list that was passed to Stan. This way you can figure out what data is being used in the actual Stan model behind brms.

2 Likes