Converting Stan's Manual Multivariate Hierarchical Model

Hey guys,

I know that the Stan’s User Guide Section 1.13 Multivariate Priors for Hierarchical Models follows this syntax (brms / lme4 style): y ~ 0 + (1 + x_1 + x_2 + ... | J). So we have a varying-slope-intercept model without fixed (population-level) effects(slopes/intercepts). According to the manual this translates to the following stan model:

data {
  int<lower=0> N;              // num individuals
  int<lower=1> K;              // num ind predictors
  int<lower=1> J;              // num groups
  int<lower=1> L;              // num group predictors
  int<lower=1,upper=J> jj[N];  // group for individual
  matrix[N, K] x;              // individual predictors
  matrix[L, J] u;              // group predictors transposed
  vector[N] y;                 // outcomes
}
parameters {
  matrix[K, J] z;
  cholesky_factor_corr[K] L_Omega;
  vector<lower=0,upper=pi()/2>[K] tau_unif;  // prior scale
  matrix[K, L] gamma;                        // group coeffs
  real<lower=0> sigma;                       // prediction error scale
}
transformed parameters {
  vector<lower=0>[K] tau = 2.5 * tan(tau_unif);
  matrix[K, J] beta = gamma * u + diag_pre_multiply(tau, L_Omega) * z;
}
model {
  vector[N] mu;
  for(n in 1:N) {
    mu[n] = x[n, ] * beta[, jj[n]];
  }
  to_vector(z) ~ std_normal();
  L_Omega ~ lkj_corr_cholesky(2);
  to_vector(gamma) ~ normal(0, 5);
  y ~ normal(mu, sigma);
}

How would I code in Stan the same model, but including intercept and slopes for population (fixed effects also)? This would be the following syntax (brms / lme4 style): y ~ 1 + x_1 + x_2 + x_3 ... + (1 + x_1 + x_2 + ... | J).

I’ve been bashing my head against the wall trying to do this but any help would be most appreciated. (bmrs code didn’t help it makes a Z_1_1, Z_1_2 etc…)

The code in the manual has the fixed effects already. That’s what the mean for each term is. To remove the fixed effects and have only random effects, those means would have to be constrained to zero. In the manual’s model, the gamma parameter encodes the means.

So u would be supplied as a matrix of 1? And the diagonal of beta would be the fixed effects?

Not quite. The SUG example is unfortunately a little over-complex for pedagogical purposes, so maybe check out the version here, which reduces the complexity a bit (omits predictors that vary among the “groups” in the SUG example’s nomenclature), has better-named variables and better commenting, plus a speed-up trick as well. There’s also lectures I did a while back covering these models starting here.

Edit: oh, I lie, the case-study-preview models do include the “group-level” predictors (“between” predictors in its nomenclature). But the lectures cover the within-only case.

Thanks I will take a look! 👍🏻

Unfortunately, your example is more convoluted than the manual (the get_constrast_matrix function did not help). I kinda understand most of the SUG example. I just want to know how to supply the u.

From the SUG u is a transposed matrix[num_predictors, num_groups] of group level predictors (but from my point of view that would NOT be supplied by me and would be estimated by Stan, so I don’t know what is doing in the data block).

Shouldn’t it be in the parameters block?

Also, if I have to supply Stan with this matrix, since my first guess was a matrix full of 1s (which you said that is not quite), my second guess then is a matrix with the first row as 1s (Intercepts) and the other rows as the reciprocal means 1/mean(x) of the predictor variables by groups (since u represents the means of the group level predictors so that it is added to the multiplied diagonal of tau SD’s and zs std_normals?

This is what I am trying to supply to Stan as a data list:

X <- unname(model.matrix(~ 1 + displ + year, mpg))
u <- mpg %>%
    group_by(class) %>%
    select(displ, year) %>%
    summarise_all(mean) %>%
    select(-class) %>%
    bind_cols(1, .) %>%
    mutate_all(function(x) 1 / x) %>%
    t() %>%
    as.matrix() %>%
    unname()

dat <- list(
    N = nrow(X),
    J = length(unique(mpg$class)),
    K = ncol(X),
    jj = as.numeric(as.factor(mpg$class)),
    X = X,
    L = ncol(X),
    u = u,
    y = mpg$hwy
)

It totally disagrees with brms:

brms_fit <- brm(
    bf(hwy ~ 1 + dspl + year + (1 + displ + year | class)),
    data = mpg,
    prior = c(
        set_prior("lkj(2)", class = "cor"),
        set_prior("cauchy(0,2)", class = "sd")
    )
)

No, u is the group-level contrast matrix and is definitely data. It is the means by which the values for any predictors that vary between the groups are passed to the model. The parameter matrix gamma contains the coefficients that specify the influence of the predictors. If the individual-level contrast matrix x is structured such that the first column is all 1 (i.e. an intercept contrast), then the first row of gamma encodes the main effect of each group-level predictor, and subsequent rows of gamma encode interactions between the group-level and individual-level predictors.

I now see from your brms code using the mpg dataset that there aren’t any group-level predictors, in which case u should be matrix with as many rows as the number of levels in the variable classes and a single column filled with 1s. You could also re-write the model to be more efficient for this case but maybe let me know if the above helps things click before I delve into how to do that part.

@mike-lawrence thank you so much, my confusion was what it meant group-level predictors and how to represent them with a matrix u. 👍🏻

1 Like

Great! Now, for slightly more efficient code/sampling, check out the code here (specificially, use the “v3” variant, which seems most performant and should become even moreso with future improvements to rows_dot_product coming).

1 Like

We really need to update the hierarchical section to first show the no-group-level-predictors case, then the no-individual-level-predictors case, and finally the predictors-at-both-levels case; I think that would help folks grok everything better.

Yes, that would indeed. Because the code was clear when you read all the math. The only point that was dropped as parachuted from nowhere was the u group predictors matrix in the data block.