# Converting Stan's Manual Multivariate Hierarchical Model

Hey guys,

I know that the Stan’s User Guide Section 1.13 Multivariate Priors for Hierarchical Models follows this syntax (`brms` / `lme4` style): `y ~ 0 + (1 + x_1 + x_2 + ... | J)`. So we have a varying-slope-intercept model without fixed (population-level) effects(slopes/intercepts). According to the manual this translates to the following stan model:

``````data {
int<lower=0> N;              // num individuals
int<lower=1> K;              // num ind predictors
int<lower=1> J;              // num groups
int<lower=1> L;              // num group predictors
int<lower=1,upper=J> jj[N];  // group for individual
matrix[N, K] x;              // individual predictors
matrix[L, J] u;              // group predictors transposed
vector[N] y;                 // outcomes
}
parameters {
matrix[K, J] z;
cholesky_factor_corr[K] L_Omega;
vector<lower=0,upper=pi()/2>[K] tau_unif;  // prior scale
matrix[K, L] gamma;                        // group coeffs
real<lower=0> sigma;                       // prediction error scale
}
transformed parameters {
vector<lower=0>[K] tau = 2.5 * tan(tau_unif);
matrix[K, J] beta = gamma * u + diag_pre_multiply(tau, L_Omega) * z;
}
model {
vector[N] mu;
for(n in 1:N) {
mu[n] = x[n, ] * beta[, jj[n]];
}
to_vector(z) ~ std_normal();
L_Omega ~ lkj_corr_cholesky(2);
to_vector(gamma) ~ normal(0, 5);
y ~ normal(mu, sigma);
}
``````

How would I code in Stan the same model, but including intercept and slopes for population (fixed effects also)? This would be the following syntax (`brms` / `lme4` style): `y ~ 1 + x_1 + x_2 + x_3 ... + (1 + x_1 + x_2 + ... | J)`.

I’ve been bashing my head against the wall trying to do this but any help would be most appreciated. (`bmrs` code didn’t help it makes a `Z_1_1`, `Z_1_2` etc…)

The code in the manual has the fixed effects already. That’s what the mean for each term is. To remove the fixed effects and have only random effects, those means would have to be constrained to zero. In the manual’s model, the gamma parameter encodes the means.

So `u` would be supplied as a matrix of `1`? And the diagonal of `beta` would be the fixed effects?

Not quite. The SUG example is unfortunately a little over-complex for pedagogical purposes, so maybe check out the version here, which reduces the complexity a bit (omits predictors that vary among the “groups” in the SUG example’s nomenclature), has better-named variables and better commenting, plus a speed-up trick as well. There’s also lectures I did a while back covering these models starting here.

Edit: oh, I lie, the case-study-preview models do include the “group-level” predictors (“between” predictors in its nomenclature). But the lectures cover the within-only case.

Thanks I will take a look! 👍🏻

Unfortunately, your example is more convoluted than the manual (the `get_constrast_matrix` function did not help). I kinda understand most of the SUG example. I just want to know how to supply the `u`.

From the SUG `u` is a transposed `matrix[num_predictors, num_groups]` of group level predictors (but from my point of view that would NOT be supplied by me and would be estimated by Stan, so I don’t know what is doing in the `data` block).

Shouldn’t it be in the `parameters` block?

Also, if I have to supply Stan with this matrix, since my first guess was a matrix full of `1`s (which you said that is not quite), my second guess then is a matrix with the first row as `1`s (Intercepts) and the other rows as the reciprocal means `1/mean(x)` of the predictor variables by groups (since `u` represents the means of the group level predictors so that it is added to the multiplied diagonal of `tau` SD’s and `z`s `std_normal`s?

This is what I am trying to supply to Stan as a data `list`:

``````X <- unname(model.matrix(~ 1 + displ + year, mpg))
u <- mpg %>%
group_by(class) %>%
select(displ, year) %>%
summarise_all(mean) %>%
select(-class) %>%
bind_cols(1, .) %>%
mutate_all(function(x) 1 / x) %>%
t() %>%
as.matrix() %>%
unname()

dat <- list(
N = nrow(X),
J = length(unique(mpg\$class)),
K = ncol(X),
jj = as.numeric(as.factor(mpg\$class)),
X = X,
L = ncol(X),
u = u,
y = mpg\$hwy
)
``````

It totally disagrees with `brms`:

``````brms_fit <- brm(
bf(hwy ~ 1 + dspl + year + (1 + displ + year | class)),
data = mpg,
prior = c(
set_prior("lkj(2)", class = "cor"),
set_prior("cauchy(0,2)", class = "sd")
)
)
``````

No, `u` is the group-level contrast matrix and is definitely data. It is the means by which the values for any predictors that vary between the groups are passed to the model. The parameter matrix `gamma` contains the coefficients that specify the influence of the predictors. If the individual-level contrast matrix `x` is structured such that the first column is all 1 (i.e. an intercept contrast), then the first row of `gamma` encodes the main effect of each group-level predictor, and subsequent rows of `gamma` encode interactions between the group-level and individual-level predictors.

I now see from your brms code using the mpg dataset that there aren’t any group-level predictors, in which case `u` should be matrix with as many rows as the number of levels in the variable `classes` and a single column filled with 1s. You could also re-write the model to be more efficient for this case but maybe let me know if the above helps things click before I delve into how to do that part.

@mike-lawrence thank you so much, my confusion was what it meant group-level predictors and how to represent them with a matrix `u`. 👍🏻

1 Like

Great! Now, for slightly more efficient code/sampling, check out the code here (specificially, use the “v3” variant, which seems most performant and should become even moreso with future improvements to `rows_dot_product` coming).

1 Like

We really need to update the hierarchical section to first show the no-group-level-predictors case, then the no-individual-level-predictors case, and finally the predictors-at-both-levels case; I think that would help folks grok everything better.

Yes, that would indeed. Because the code was clear when you read all the math. The only point that was dropped as parachuted from nowhere was the `u` group predictors matrix in the `data` block.