How to speed up GLMM fitting when drawing predictions for a hold out data set?

Below is a GLMM, with intercepts and beta parameters that differ by group (no pooling), and a sigma parameter in the call to normal() that also differs by group. Fighting with the syntax for the X * B vector multiplication when the Bs differ by group led me to just stick everything in loops, but this model is hopelessly slow (takes around an hour to train on only 5000 rows of data). I compound this by also making predictions for a test set in the generated quantities section, which again is done with a loop. This section I would really like to try and vectorize, but am unsure how to achieve this given that I’m making predictions using normal_rng() and I have that complexity in producing mu_test from those terms that differ by group, and the sigma here differs by group.

Any help folks can offer would be greatly appreciated. I’m familiar with vectorizing simple models where intercepts and beta parameters are fixed, but not with mixed-models.

data {
    int<lower=0> N; // number of samples
    int<lower=0> K; // number of predictors
    int<lower=1> J; // number of groups

    vector[N] y; // response variable
    matrix[N, K] X; // predictor matrix
    int x_group[N]; // the group ids for each sample
    vector[N] y_test; // response variable testing set
    matrix[N, K] X_test; // predictor matrix testing set
    int x_group_test[N]; // the group ids for each sample testing set

parameters {
    vector[J] alpha; // vector of group level intercepts
    vector[K] beta[J]; // coefficients for predictors, one for each J groups
    vector<lower=0>[J] sigma;
transformed parameters {
    vector[N] mu;
    for (i in 1:N){
    mu[i] = alpha[x_group[i]] + X[i] * (beta[x_group[i]]);
model {
    // priors
    for (j in 1:J){
    sigma[j] ~ exponential(1);
    for (i in 1:N){
    y[i] ~ normal(mu[i], sigma[x_group[i]]);
generated quantities {
    vector[N] log_lik;
    vector[N] y_hat;
    vector[N] resid;
    vector[N] mu_test;
    vector[N] y_pred;
    vector[N] squared_error;
    real MSE;
    vector[N] abs_per_error;
    vector[2] MAPE_denom;
    real MAPE;
    for (i in 1:N) {
        log_lik[i] = normal_lpdf(y[i] | mu[i], sigma[x_group[i]]);
        y_hat[i] = normal_rng(mu[i], sigma[x_group[i]]);
        resid[i] = y[i] - y_hat[i];
        // make prediction
        mu_test[i] = alpha[x_group_test[i]] + X_test[i] * beta[x_group_test[i]];
        y_pred[i] = normal_rng(mu_test[i], sigma[x_group_test[i]]);
        // calc squared error
        squared_error[i] = square(y_test[i] - y_pred[i]);
        // calc Absolute Percentage Error
        MAPE_denom[1] = 0.00001;  // need to set to account for 0s
        MAPE_denom[2] = abs(y_test[i]);
        abs_per_error[i] = abs(y_test[i] - y_pred[i]) / max(MAPE_denom);
    MSE = mean(squared_error);
    MAPE = mean(abs_per_error);


The likelihood can simplify to just:

y ~ normal(mu, sigma[x_group]);

It’s not immediately obvious looking at this what is slow though. It might be the improper priors on alpha and beta or correlations in X that are causing problems.

As this model is written, the groups are influencing each other anyway (presumably you’ll want this eventually, but as it is written now I think they’re independent). You could start investigating this with just the data from one group. So add priors and see if that helps, and check out 1.2 The QR reparameterization | Stan User’s Guide if your covariates are correlated much.

There are some new profiling tools to see what calculations are taking a long time too Profiling Stan programs with CmdStanR • cmdstanr (so you can check how much time is spent building mu vs evaluating the likelihood).

1 Like

Thanks for taking the time to write a response, Ben. I think you also helped me on my last post, and I think the hard work you do on this forum is invaluable.

I had a crack at implementing the QR reparameterization and I think it helped quite a bit (although I confess I will need to read up a little about WHY it helped!). I was able to vectorize the likelihood as you suggested too, and with the QR code on the Stan Guide also avoid the loop for constructing mu.

The last piece in the puzzle for me is the loop in the generated quantities, to produce log_lik and those others. I have taken the MSE and MAPE quantities out. I now only sample the posterior of y_pred in stan and calculate the other quantities (squared error, absolute percentage error, MSE and MAPE) in a vectorized manner with numpy. I do still need the loop for the calls to normal_lpdf() and normal_rng() though. Is there a way to vectorize these?

Thanks again!

1 Like