First, I’d think about rearranging x and beta so you don’t have to transpose in the density. It’d be even better to collect an array of the x_beta[i] and of DOWELL and then vectorize the whole thing. But that’s just efficiency, not interpretability.
It sounds like you have discrete predictors, but you seem to be using something like a continuous encoding of them in x. The usual way to code your predictors for regression would be this way:
data {
int<lower=0> N; // observations
// predictors
array[N] int<lower=0, upper=1> male; // assuming 1 if male, 0 if female
array[N] int<lower= 1, upper=5> books;
vector[N] motivation;
and then the parameters look like this:
parameters {
vector[4] alpha;
vector[4] beta_sex;
array[5] vector[4] beta_books;
vector[4] beta_motivation;
array[N] int<lower=1, upper=4> dowell;
and then the likelihood looks like this:
for (n in 1:N)
dowell[n] ~ categorical_logit(alpha + beta_sex * sex[n] + beta_books[books[n]] + beta_motivation * x[n]);
There are lots of choices here for parameterizations when you get into multiple outcomes. For more efficiency, you can pack the first three terms into a matrix indexed by sex and books.
Now the alpha is an intercept, beta_sex the male effect (female rolled into intercept), and beta_books[j] the factors for having j units of books, and so on.
The problem is that categorical_logit is only identified up to an additive constant. You can handle this by pinning one of the categories to 0 and having everything else be relative, or you can let the priors sort it out. Or you can set the coefficients for the random effects so that they sum to 0 and everything else gets pushed into the intercept.