Translating (ordered-probit) model to statistical notation - where to look?

JLC · August 22, 2020, 3:13am

Hi all,

I’m wondering if anyone can point me towards some resources for translating models into accurate statistical notation.

I’m working on translating an ordered-probit model (discussed in this thread) into statistical notation, but am not at all confident what I have is correct.

What I have so far:

Y_{sm} = \gamma_{sm} \Leftrightarrow \theta_{m,\gamma_{sm}} < \widetilde Y_{sm} \leq \theta_{m,\gamma_{sm}}, \gamma_{sm} \equiv {1,...,K_{m}}

where \gamma_{sm} is a category out of K_{m} ordered categories and \theta_{m} is a vector of threshold parameters for outcome m.

\begin{align*} p(y_{m} = k_{m} | $\mu_{m}, \sigma_{m}, \{ \theta_{sm}) & = \Phi((\theta_{km} - \mu_{m})/ (\sigma_{s} + \sigma_{m} + \sigma_{t}) - \Phi((\theta_{k_{m} - 1} - \mu_{m})/ (\sigma_{s} + \sigma_{m} + \sigma_{t})) \\ \mu_{smt} & = \beta_{0smt} + \beta_{Intensity_{smt}}*\text{Intensity}_{smt} \\ &+ \beta_{Condition_{smt}}*\text{Condition}_{smt} \\ &+ \beta_{Category_{smt}}* \text {Category}_{smt} \\ &+ \beta_{Condition.Category} * \text{Condition.Category}_{smt}\\ \beta_0 & = 0 \\ \beta_{Intensity} & \sim \operatorname{Student} (6,1,1.5) \\ \beta_{Condition} & \sim \operatorname{Student} (6,1,1.5) \\ \beta_{Category} & \sim \operatorname{Student} (6,1,1.5) \\ \beta_{Condition.Category} & \sim \operatorname{Student} (6,1,1.5) \\ \sigma_{s} & \sim \operatorname{Exp} (3) \\ \sigma_{t} & \sim \operatorname{Exp}(2) \\ \sigma_{m} & = 1 / disc \\ \log (disc) & = \gamma_{Category} \\ \gamma_{Category = Unpleasant} & = 0 \\ \gamma_{Category = Neutral} & \sim \operatorname{Normal} (0, 2) \\ \gamma_{Category = Pleasant} & \sim \operatorname{Normal} (0, 2) \\ \theta_{m1},...,\theta_{m_{K - 1}} & \sim \operatorname{Normal} (0, 6) \\ Corr_{smt} & \sim \operatorname{Cholesky LKJ} (1) \\ \end{align*}

My brms model is:

Mod_fmla <- bf(mvbind(Valence, Arousal) | resp_thres(8, gr = Category) ~ Category +
                                            Condition +
                                            Category:Condition +
                                            Intensity.c + 
                                            (1|o|Token) +
                                            (1|p|Condition) +
                                            (1|q|Condition:Subject)) +
        lf(disc ~ 0 + Category, resp = "Valence", cmc = FALSE) +
        lf(disc ~ 0 + Category, resp = "Arousal", cmc = FALSE)

## priors
priors <- c(
        set_prior("normal(0,6)",  
                  class = "Intercept",
                  coef = "",
                  group = "Neutral", 
                  resp = ""),
        
        set_prior("normal(0,6)",  
                  class = "Intercept",
                  coef = "",
                  group = "Pleasant",
                  resp = ""),
        
        set_prior("student_t(6,1,1.5)",
                  class = "b",
                  coef = "",
                  resp = ""),
       
        set_prior("normal(0,2)",
                  class = "b",
                  coef = "CategoryNeutral",
                  resp = "",
                  dpar = "disc"),
        
        set_prior("normal(0,2)",
                  class = "b",
                  coef = "CategoryPleasant",
                  resp = "",
                  dpar = "disc"),
        
        set_prior("exponential(2)",
                  class = "sd",
                  coef = "Intercept",
                  group = "Condition",
                  resp = ""),
        set_prior("exponential(3)",
                  class = "sd",
                  coef = "Intercept",
                  group = "Condition:Subject",
                  resp = ""),
        set_prior("exponential(2)",
                  class = "sd",
                  coef = "Intercept",
                  group = "Token",
                  resp = "")      
       )

The stan code is:

parameters {
  vector[K_Valence] b_Valence;  // population-level effects
  ordered[nthres_Valence[1]] Intercept_Valence_1;  // temporary thresholds for centered predictors
  ordered[nthres_Valence[2]] Intercept_Valence_2;  // temporary thresholds for centered predictors
  ordered[nthres_Valence[3]] Intercept_Valence_3;  // temporary thresholds for centered predictors
  vector[K_disc_Valence] b_disc_Valence;  // population-level effects
  vector[K_Arousal] b_Arousal;  // population-level effects
  ordered[nthres_Arousal[1]] Intercept_Arousal_1;  // temporary thresholds for centered predictors
  ordered[nthres_Arousal[2]] Intercept_Arousal_2;  // temporary thresholds for centered predictors
  ordered[nthres_Arousal[3]] Intercept_Arousal_3;  // temporary thresholds for centered predictors
  vector[K_disc_Arousal] b_disc_Arousal;  // population-level effects
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  matrix[M_1, N_1] z_1;  // standardized group-level effects
  cholesky_factor_corr[M_1] L_1;  // cholesky factor of correlation matrix
  vector<lower=0>[M_2] sd_2;  // group-level standard deviations
  matrix[M_2, N_2] z_2;  // standardized group-level effects
  cholesky_factor_corr[M_2] L_2;  // cholesky factor of correlation matrix
  vector<lower=0>[M_3] sd_3;  // group-level standard deviations
  matrix[M_3, N_3] z_3;  // standardized group-level effects
  cholesky_factor_corr[M_3] L_3;  // cholesky factor of correlation matrix
}
transformed parameters {
  matrix[N_1, M_1] r_1;  // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_1] r_1_Valence_1;
  vector[N_1] r_1_Arousal_2;
  matrix[N_2, M_2] r_2;  // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_2] r_2_Valence_1;
  vector[N_2] r_2_Arousal_2;
  matrix[N_3, M_3] r_3;  // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_3] r_3_Valence_1;
  vector[N_3] r_3_Arousal_2;
  // compute actual group-level effects
  r_1 = (diag_pre_multiply(sd_1, L_1) * z_1)';
  r_1_Valence_1 = r_1[, 1];
  r_1_Arousal_2 = r_1[, 2];
  // compute actual group-level effects
  r_2 = (diag_pre_multiply(sd_2, L_2) * z_2)';
  r_2_Valence_1 = r_2[, 1];
  r_2_Arousal_2 = r_2[, 2];
  // compute actual group-level effects
  r_3 = (diag_pre_multiply(sd_3, L_3) * z_3)';
  r_3_Valence_1 = r_3[, 1];
  r_3_Arousal_2 = r_3[, 2];
}
model {
  vector[nmthres_Valence] merged_Intercept_Valence;  // merged thresholds
  // initialize linear predictor term
  vector[N_Valence] mu_Valence = X_Valence * b_Valence;
  // initialize linear predictor term
  vector[N_Valence] disc_Valence = X_disc_Valence * b_disc_Valence;
  vector[nmthres_Arousal] merged_Intercept_Arousal;  // merged thresholds
  // initialize linear predictor term
  vector[N_Arousal] mu_Arousal = X_Arousal * b_Arousal;
  // initialize linear predictor term
  vector[N_Arousal] disc_Arousal = X_disc_Arousal * b_disc_Arousal;
  merged_Intercept_Valence[Kthres_start_Valence[1]:Kthres_end_Valence[1]] = Intercept_Valence_1;
  merged_Intercept_Valence[Kthres_start_Valence[2]:Kthres_end_Valence[2]] = Intercept_Valence_2;
  merged_Intercept_Valence[Kthres_start_Valence[3]:Kthres_end_Valence[3]] = Intercept_Valence_3;
  merged_Intercept_Arousal[Kthres_start_Arousal[1]:Kthres_end_Arousal[1]] = Intercept_Arousal_1;
  merged_Intercept_Arousal[Kthres_start_Arousal[2]:Kthres_end_Arousal[2]] = Intercept_Arousal_2;
  merged_Intercept_Arousal[Kthres_start_Arousal[3]:Kthres_end_Arousal[3]] = Intercept_Arousal_3;
  for (n in 1:N_Valence) {
    // add more terms to the linear predictor
    mu_Valence[n] += r_1_Valence_1[J_1_Valence[n]] * Z_1_Valence_1[n] + r_2_Valence_1[J_2_Valence[n]] * Z_2_Valence_1[n] + r_3_Valence_1[J_3_Valence[n]] * Z_3_Valence_1[n];
  }
  for (n in 1:N_Arousal) {
    // add more terms to the linear predictor
    mu_Arousal[n] += r_1_Arousal_2[J_1_Arousal[n]] * Z_1_Arousal_2[n] + r_2_Arousal_2[J_2_Arousal[n]] * Z_2_Arousal_2[n] + r_3_Arousal_2[J_3_Arousal[n]] * Z_3_Arousal_2[n];
  }
  for (n in 1:N_Valence) {
    // apply the inverse link function
    disc_Valence[n] = exp(disc_Valence[n]);
  }
  for (n in 1:N_Arousal) {
    // apply the inverse link function
    disc_Arousal[n] = exp(disc_Arousal[n]);
  }
  // priors including all constants
  target += student_t_lpdf(b_Valence | 6,1,1.5);
  target += normal_lpdf(Intercept_Valence_2 | 0,6);
  target += normal_lpdf(Intercept_Valence_3 | 0,6);
  target += normal_lpdf(b_disc_Valence[1] | 0,2);
  target += normal_lpdf(b_disc_Valence[2] | 0,2);
  target += student_t_lpdf(b_Arousal | 6,1,1.5);
  target += normal_lpdf(Intercept_Arousal_2 | 0,6);
  target += normal_lpdf(Intercept_Arousal_3 | 0,6);
  target += normal_lpdf(b_disc_Arousal[1] | 0,2);
  target += normal_lpdf(b_disc_Arousal[2] | 0,2);
  target += exponential_lpdf(sd_1[1] | 2);
  target += exponential_lpdf(sd_1[2] | 2);
  target += std_normal_lpdf(to_vector(z_1));
  target += lkj_corr_cholesky_lpdf(L_1 | 1);
  target += exponential_lpdf(sd_2[1] | 3);
  target += exponential_lpdf(sd_2[2] | 3);
  target += std_normal_lpdf(to_vector(z_2));
  target += lkj_corr_cholesky_lpdf(L_2 | 1);
  target += exponential_lpdf(sd_3[1] | 2);
  target += exponential_lpdf(sd_3[2] | 2);
  target += std_normal_lpdf(to_vector(z_3));
  target += lkj_corr_cholesky_lpdf(L_3 | 1);
  // likelihood including all constants
  if (!prior_only) {
    for (n in 1:N_Valence) {
      target += cumulative_probit_merged_lpmf(Y_Valence[n] | mu_Valence[n], disc_Valence[n], merged_Intercept_Valence, Jthres_Valence[n]);
    }
    for (n in 1:N_Arousal) {
      target += cumulative_probit_merged_lpmf(Y_Arousal[n] | mu_Arousal[n], disc_Arousal[n], merged_Intercept_Arousal, Jthres_Arousal[n]);
    }
  }
}
[details="Summary"]
This text will be hidden
[/details]

martinmodrak · August 26, 2020, 2:50pm

First, I would mention that a growing part of academic readership would IMHO be faster to understand your model from the R formula than from the math notation, so - depending on your target audience - doing away with the math notation completely might be feasible, though a bit riskier as it is not (yet) commonly done.

Definitely also check out
Understanding the Mathematical Model and Mathematical description of a brms model where a similar problem is discussed

I see a few weird things:

I thought you were using cumulative("logit") family, but using \Phi suggest you use cumulative("probit")
Categorical variables would be encoded via dummy coding with a reference level sou you would have a coefficient for each level of category, condition and category:condition, except for the reference level.
The \sigma_s, \sigma_m, \sigma_t should probably be the sd on the varying intercepts - those are however currently missing from the models, you should have something like so you would need to explicitly write those out. An example of how that might look like is on page 8 of my recent publication

And there are potentially other issues.

You might also want to simplify and just to state the model in matrix notation ((https://en.wikipedia.org/wiki/Mixed_model)) and then rely on the brms formula for details.

But if you need to have the full math notation, feel free to get back for additional clarification :-)

JLC · August 30, 2020, 5:30pm

Thank you, Martin!

I am using cumulative("probit") to align with Paul’s tutorial and a recent paper by Liddell and Kruschke

I attempted to capture the dummy coding and priors with the betas for Intensity, Category, Condition and Condition.Category, which I’m trying to reflect in the linear portion describing mu, but I’m not sure how to incorporate the sigmas into that part of the model description. I have them summed in the phi terms, but I don’t think that’s correct.

martinmodrak · September 6, 2020, 3:07pm

You are right, summing them in phi is incorrect. I am not quite sure where you are stuck, but it seems some introductory material looking at hierarchical models from a more mathematical standpoint might help. I think the description in Statistical Rethinking is quite accessible while detailed enough to help you move forward, but I would expect other books to have good treatment.

If I understand your question, the sigmas refer to the sd parameters for Token etc. Those are the standard deviations for the varying intercepts. Mapping it to the example from my recent publication that I already linked to, this is similar to how e.g. the (1 || phenotype) term is coded via the \beta^1_{p_i} and \sigma_1 coefficients (but you also have a covariance matrix to take care of).

Topic		Replies	Views
Help: Translating ordered-logit w/ varying intercepts model into mathematical notation brms	1	433	February 24, 2019
Understanding brms model code: Ordered Probit with Ordered Predictor (monotonic effects) Modeling rstan , cmdstanr , brms	1	556	October 3, 2023
Kurz/McElreath vs. Nalborczyk/Bürkner: question on notation on multivariate response correlation within SEM (via brms) General notation , brms	5	1156	May 31, 2022
Identification and Cumulative Probit models Modeling brms	6	451	March 13, 2023
Notation for multivariate model with varying effects brms specification , brms	3	512	August 19, 2022

Translating (ordered-probit) model to statistical notation - where to look?

Related topics