Difference in modeling and fitting ordinal data using brms

Hello!

I am building a stan model (using rstan) to fit ordinal data (1-8 on a scale) from an experiment with two testing sessions. The effects of two independent variables and their interaction are modeled. I used a hierarchical structure to define each of the four model parameters (two IVs, interaction, and intercept): a parameter for each individual and a group-level distribution (mean and sd) for the entire group.

parameters {

  // Group-level parameter means
  real mu_mean_lex;
  real mu_mean_noise;
  real mu_mean_inter;
  real mu_mean_cept;

  // Group-level parameter SDs
  real<lower=0> mu_sd_lex;
  real<lower=0> mu_sd_noise;
  real<lower=0> mu_sd_inter;
  real<lower=0> mu_sd_cept;

  // Individual-level parameters
  vector[N] mu_i_lex;
  vector[N] mu_i_noise;
  vector[N] mu_i_inter;
  vector[N] mu_i_cept;  

} 

The model is fit to the data X[i,n] of two testing sessions separately using the ordered_logistic() function:

model {
  ...

  // For each subject
	for (i in 1:N) {
        // For each trial
		for(n in 1:T_ctr){
	
		    row_vector[param_num] conds = [Cond[1,n],Cond[2,n],(Cond[1,n]*Cond[2,n]), 1];
		    vector[param_num] params = [mu_i_lex[i], mu_i_noise[i], mu_i_inter[i], mu_i_cept[i]]';
		    
		    target += ordered_logistic_lpmf(X[i,n]| conds*params, c);
		  
		}
	}
}

But the posterior means of mu_i_inter showed a low correlation across the two testing sessions. So I tried to model the experiment using brms:

model.s1 <- brm(response ~ condition*ar + (condition*ar|participant), data = df_data1, family = cumulative(), iter = 5000, chains = 1)

Different from my stan model, the brms-estimated interaction effect has a much higher correlation across two sessions.

I used stancode(model.s1) to check its stan code and found two main things that are different from my model.

The first one is the different implementation of group-level effects. Instead of a hierarchical model, the brms model used the combination of both group-level and individual-level effects in the model-fitting:

model {
  ...

  vector[N] mu = rep_vector(0.0, N);
    mu += Xc * b;
    for (n in 1:N) {
      // add more terms to the linear predictor
      mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n] + r_1_3[J_1[n]] * Z_1_3[n] + r_1_4[J_1[n]] * Z_1_4[n];
    }
    for (n in 1:N) {
      target += ordered_logistic_lpmf(Y[n] | mu[n], Intercept);
    }
    
    ...

}

Xc and b are group-level conditions and effect coefficients. Z_1_1[n] - Z_1_4[n] are individual-level effect coefficients.


Another difference is the lprior:

transformed parameters {
  real lprior = 0;  // prior contributions to the log posterior
  lprior += student_t_lpdf(Intercept | 3, 0, 2.5);
    lprior += student_t_lpdf(sd_1 | 3, 0, 2.5)
      - 4 * student_t_lccdf(0 | 3, 0, 2.5);
  lprior += lkj_corr_cholesky_lpdf(L_1 | 1);
}
model {
  if (!prior_only) {

    ...
    // model fitting using ordered_logistic()
   
  }
  target += lprior;
  target += std_normal_lpdf(to_vector(z_1));
}

So just wondering what the function of lprior is and whether it is the two different implementations above that contributed to the difference in parameter estimation?

Welcome to the Stan forum.

In brms when you have terms like (condition*ar | participant) it means all of the subject-level coefficients have a shared correlation matrix (the one whose cholesky factor gets the lkj_corr_cholesky_lpdf prior). Your Stan code doesn’t encode any correlation structure between them from what I can tell. That’s not wrong necessarily, it’s just a different model. If I had to guess, my guess would be that this is the main source of the differences you’re seeing.

It’s just a computation organization decision, not related to the modeling. brms separates the prior log-density contributions into a single variable lprior so it can be accessed/saved for various purposes. All of those += prior statements in transformed parameters could go individually in the model block without being combined into lprior and the statistical model would be unchanged.

Thank you very much for answering my question!