Rewiring a multivariate bernoulli regression to include residual correlations

I would like to model residual correlations in a multivariate logistic regression (discussed here). brms offers the option to do this for gaussian models only, which leaves me with no option other than to crack open the black box and try to fiddle around with the wires.

It has been a long time since I coded manually in stan and my skills have atrophied. Also even when I was at my best the ninja stuff going on under the hood in brms is mostly above my head. So my approach has been to (1) use the brms stancode() and make_standata() functions to extract the stan code and data list from a gaussian brm model with setrescor(TRUE) and compare it to a gaussian model with setrescor(FALSE), try to see what the differences are, and attempt to transpose that to my logistic model. Far from ideal I know but needs must.

And of course it didn’t work. I know going in that there are things that simply do not translate from gaussian models to logit models, for example the sigma() parameter. But frankly I need the community’s help with this one.

I have real data for a very simple model, estimating the association between two binary outcomes, binary1 and binary2 and a single binary predictor `relstatus’. Here are the data

exDF.RData (9.6 KB)

And here is the model in brms

# create multivariate formula, regressing the two outcomes on the single predictor
bform_mvr <- bf(mvbind(binary1, binary2) ~ pred)

# run model in brms
mvrMod_bin <- brm(formula = bform_mvr,
              family = bernoulli,
              data = exDF,
              seed = 1234)

Now what I have done is extracted the stan code and data list

mvrDat_bin <- make_standata(mvrMod_bin)
mvrCode_bin <- stancode(mvrMod_bin)

Adding two data that are necessary to model residual correlations to the datalist

mvrDat_bin$nrescor <- 1
mvrDat_bin$nresp <- 2

And then made alterations to the stan code in mvrCode_bin based on what I observed was added to the code in the gaussian version when setrescor() was changed from FALSE to TRUE. Here is the code. Now I should point out again that I know this code is wrong - it is for a numeric model but don’t know how to tweak it for the purposes of a logit model. I’ve spread it out a bit for readability and where I think the extra code should go I have added notes saying as much (mostly, the models section is almost all new so there was little point.

mvrCode_corr <- "
data {
  // outcome 1
  int<lower=1> N;  // total number of observations
  int<lower=1> N_binary1;  // number of observations
  array[N_binary1] int Y_binary1;  // response variable
  int<lower=1> K_binary1;  // number of population-level effects
  matrix[N_binary1, K_binary1] X_binary1;  // population-level design matrix
  int<lower=1> Kc_binary1;  // number of population-level effects after centering
  
  // outcome 2
  int<lower=1> N_binary2;  // number of observations
  array[N_binary2] int Y_binary2;  // response variable
  int<lower=1> K_binary2;  // number of population-level effects
  matrix[N_binary2, K_binary2] X_binary2;  // population-level design matrix
  int<lower=1> Kc_binary2;  // number of population-level effects after centering
  
  // potential code for modelling resdiaul correlations
  int<lower=1> nresp;  // number of responses
  int nrescor;  // number of residual correlations
  
  int prior_only;  // should the likelihood be ignored?
}

transformed data {
  // outcome 1
  matrix[N_binary1, Kc_binary1] Xc_binary1;  // centered version of X_binary1 without an intercept
  vector[Kc_binary1] means_X_binary1;  // column means of X_binary1 before centering
  // outcome 2
  matrix[N_binary2, Kc_binary2] Xc_binary2;  // centered version of X_binary2 without an intercept
  vector[Kc_binary2] means_X_binary2;  // column means of X_binary2 before centering
  
  // potential code for residual correlations
  array[N] vector[nresp] Y;  // response array
  
  // outcome 1
  for (i in 2:K_binary1) {
    means_X_binary1[i - 1] = mean(X_binary1[, i]);
    Xc_binary1[, i - 1] = X_binary1[, i] - means_X_binary1[i - 1];
  }
  
  // outcome 2 
  for (i in 2:K_binary2) {
    means_X_binary2[i - 1] = mean(X_binary2[, i]);
    Xc_binary2[, i - 1] = X_binary2[, i] - means_X_binary2[i - 1];
  }
  
  // potential code for resid correlations
    for (n in 1:N) {
    Y[n] = transpose([Y_binary1[n], Y_binary2[n]]);
  }

}

parameters {
  vector[Kc_binary1] b_binary1;  // regression coefficients
  real Intercept_binary1;  // temporary intercept for centered predictors
  vector[Kc_binary2] b_binary2;  // regression coefficients
  real Intercept_binary2;  // temporary intercept for centered predictors
  // potential code for residual correlations
  cholesky_factor_corr[nresp] Lrescor;  // parameters for multivariate linear models
}

transformed parameters {
  // prior contributions to the log posterior
  real lprior = 0;
  // outcome 1
  lprior += student_t_lpdf(Intercept_binary1 | 3, 0, 2.5);
  // outcome 2
  lprior += student_t_lpdf(Intercept_binary2 | 3, 0, 2.5);
  // potential code for resid corr
  lprior += lkj_corr_cholesky_lpdf(Lrescor | 1);
}

model {
  // likelihood including constants
  if (!prior_only) {
    // initialise linear predictor terms, outcome 1
    vector[N_binary1] mu_binary1 = rep_vector(0.0, N_binary1);
    
    // initialise linear predictor terms, outcome 2
    vector[N_binary2] mu_binary2 = rep_vector(0.0, N_binary2);
    
    // multivariate predictor array
    array[N] vector[nresp] Mu;
    vector[nresp] sigma = transpose([sigma_binary1, sigma_binary2]);
    
    // Cholesky factor of residual 
    matrix[nresp, nresp] LSigma = diag_pre_multiply(sigma, Lrescor);
    mu_binary1 += Intercept_binary1 + Xc_binary1 * b_binary1;
    mu_binary2 += Intercept_binary2 + Xc_binary2 * b_binary2;
    
    // combine univariate parameters
    for (n in 1:N) {
      Mu[n] = transpose([mu_binary1[n], mu_binary2[n]]);
    }
    target += multi_normal_cholesky_lpdf(Y | Mu, LSigma);
    
  }
  // priors including constants
  target += lprior;
}

generated quantities {
  // actual population-level intercept
  real b_binary1_Intercept = Intercept_binary1 - dot_product(means_X_binary1, b_binary1);
  
  // actual population-level intercept
  real b_binary2_Intercept = Intercept_binary2 - dot_product(means_X_binary2, b_binary2);
  
  // residual correlations
  corr_matrix[nresp] Rescor = multiply_lower_tri_self_transpose(Lrescor);
  vector<lower=-1,upper=1>[nrescor] rescor;
  
  // extract upper diagonal of correlation matrix
  for (k in 1:nresp) {
    for (j in 1:(k - 1)) {
      rescor[choose(k - 1, 2) + j] = Rescor[j, k];
    }
  }
  
}
"

And unsurprisingly, when I went to run the model

mvrMod_stan_bin_corr <- stan(model_code = mvrCode_corr,
                             data = mvrDat_bin,
                             seed = 1234)

I got an error

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  0
Semantic error in 'string', line 67, column 35 to column 42:
   -------------------------------------------------
    65:    lprior += student_t_lpdf(Intercept_binary2 | 3, 0, 2.5);
    66:    // potential code for resid corr
    67:    lprior += lkj_corr_cholesky_lpdf(Lrescor | 1);
                                            ^
    68:  }
    69:  model {
   -------------------------------------------------

Identifier 'Lrescor' not in scope. Did you mean 'nrescor'?

Can anyone help me tweak my model code to get those residual correlations estimates to work? More than happy to get schooled in any other elements along the way. I would love to understand what’s going on in this code, but I think that would take a whole semester of fulltime study of nothing else.