Hierarchical Categorical Logit with large predictor matrix fails to initialise

This is the way I’d do random effects in Stan.

vector[N] mu = X * B; // fixed effects
for(I in 1:N){
  mu[l] = mu[l] + theta[rel_id[l]]; // add on random effects
  y[l] ~ N(mu[I],sigma);
}

And non-center the thetas:

for(j in 1:Nj){
  theta[j] = theta_z[j] * sigma_theta;
}

Where theta is defined at the top of the model block and theta_z is a parameter. This mostly works well.

I think you can definitely code the random effects with a matrix like that, but then you’d need to be careful to use sparse matrix vector products to make it efficient. I’m not familiar with standardizing the matrix like that.

I asked @jonah/@Bob_Carpenter about the multinomial vs. Poisson thing. They said @bgoodri at some point thought the Poisson trick would be good, but I don’t know if anyone has actually tried.

So no solid advice there. Best we can say is try both things and see what happens. If you get a chance to compare them both, we’d be curious to hear how it works out.