Non-centered categorical logit with random slopes and intercept

Hi,

I am trying to fit a categorical logit model with random slopes and intercepts. The model will be applied to a few different data sets with similar predictors X, number of observations N, and number of outcome categories K\in \{4, 7, 8\}. The data is nested with n observations split across individual donors i located within different industries j. The number of observations per individuals varies. Ideally, I would account for variation across both grouping variables, but am starting simple by only focusing on industries as my quantity of interest is the difference in effect size of a single factor variable across industries.

So far, I’ve got a baseline model with just random intercepts based off of @rtrangucci’s model here. The model works fine on simulated data and returns similar results as a brms, but is far more efficient and doesn’t degrade as I scale up the complexity. I want to extend it so that a subset s of the individual-level \beta's also have group-level random slopes \beta_J^s.

I am guessing I should utilize a multivariate parameterization and Cholesky factorization. Implementing this for a bernoulli_logit is clear enough from the user’s manual, but I am not sure how to scale it up to K outcomes. In particular, object choice/structure is stumping me when it comes to the (transformed) parameters.

Here’s what I have got so far:

data {
  int<lower=2> K; // number of DV categories
  int<lower=0> N; // number of observations
  int<lower=1> D; // number of individual predictors
  matrix[N, D] x; // individual predictors
  int<lower=1> J_ind; // number of industries
  int<lower=1> P; //number of group predictors
  int<lower=1, upper=J_ind> idx_ind[N]; // industry for individual
  matrix[J_ind, P] u; // group-level predictors
  int<lower=1> G; // number of grouping variables
  int<lower=1, upper=K> y[N]; // outcome
}
transformed data {
  row_vector[D] zeros_pop = rep_row_vector(0, D); // zeros to append to pop-level coefs
  row_vector[P] zeros_grp = rep_row_vector(0, P); // zeros to append to grp-level coefs
  row_vector[J_ind] zeros_ind = rep_row_vector(0, J_ind); // zeros for grp-level intercepts
  matrix[D, N] t_X = x'; // x transpose
  matrix[P, J_ind] = u'; // u transpose
}
parameters {
  matrix[K - 1, D] beta_raw; // betas for K - 1 outcomes
  vector[K - 1] alpha_raw; // intercept for K - 1 outcomes
  matrix[K - 1, J_ind] eta_ind; // matrix for (K-1) * J_ind intercept coefficients
  vector<lower=0>[K - 1] sigma_ind; // ind variances
  vector<lower=0>[G] sigma_inter_eqn; // inter-group (ind) variance
}
transformed parameters {
  matrix[K, D] beta;
  matrix[K, P] theta;
  matrix[K, J_ind] alpha_ind;
  vector[K] alpha;
  
  // fill in constrained parameters with one index/row set to zero
  for (k in 1:(K - 1)) {
    alpha[k + 1] = alpha_raw[k];
    alpha_ind[k + 1] = sigma_inter_eqn[1] * sigma_ind[k] * eta_ind[k];
  }
  beta = append_row(zeros_pop, beta_raw);
  alpha[1] = 0;
  alpha_ind[1] = zeros_ind;
}
model {
  // prior on locations
  to_vector(beta_raw) ~ normal(0, 1);
  to_vector(eta_ind) ~ normal(0, 5);
  alpha_raw ~ normal(0, 5);
  
  // prior on variance
  sigma_ind ~ normal(0, 5);
  sigma_inter_eqn ~ normal(0, 1);
  
  // likelihood
  {
    matrix[K, N] mu_logit;
    mu_logit = beta * t_X;
    for (n in 1:N)
      y[n] ~ categorical_logit(
        alpha + col(mu_logit, n) + col(alpha_ind, idx_ind[n])
        );
  }
}

I am probably over-complicating things. Any help that will set me in the right direction is greatly appreciated.

Sorry, don’t have time to delve into this now, but hopefully @mike-lawrence can spare a moment?

Sorry for taking long to respond. I am trying to understand what you are after, so just to be clear, what you want is to have a bunch of coefficients, let’s call them \gamma_{k,j} such that for each industry j, the coefficients are correlated across categories, i.e.:

\gamma_{k} \sim MVN(0, \mathbb{\Sigma_\gamma})

or do you want the coefficients for each category to be correlated across industries? Or something else?

I am not sure there is a good a priori reason to prefer one or the other, it really depends on what you are after.

Also is your problem only with coding the appropriate MVN or you have problem to apply it only to the subset s? (I don’t really understand what this subset represent, though). In other words, would you know how to write the model using multi_normal (without Cholesky)?

Martin,

My apologies for being vague.

I expect the intercept and slope coefficients to vary by industry j such that \gamma_j\sim MVN(\mu,\Sigma) .

I hadn’t thought about the coefficients for each industry being correlated across categories k. This is theoretically plausible given that the outcomes are different political parties some of which are part of the same coalition, so donations to them could be treated as complementary goods or substitutes. I’d be amenable to modeling this, but it’s not my main priority.

The main problem is coding the appropriate MVN or Cholesky Factorized version—I’d be happy with either, though I expect the latter to be more efficient—in a categorical setting.

Regarding s, what I mean is that among the common coefficients \gamma, I only want to allow a subset of those to vary by industry, the intercept and a single ‘treatment’ variable. Theoretically, I am only interested in how the treatment effect varies across industries. Plus, the model becomes increasingly intractable if I increase the dimensionality much more.

Sorry, I got slightly burned out on this and can’t get myself to compose a good reply (no good reason, just my unreliable motivation), I’ll ask @imadmali or @Guido_Biele if they have time to help you here. Otherwise, I’ll get to this next week.

No worries. I did not expect a fast answer (or one at all) and have plenty else to work on in the mean time. In a sense, it’s gratifying to know that the model is not as trivial to write as I expected.