# How to speed up GLMM fitting when drawing predictions for a hold out data set?

Below is a GLMM, with intercepts and beta parameters that differ by group (no pooling), and a sigma parameter in the call to normal() that also differs by group. Fighting with the syntax for the X * B vector multiplication when the Bs differ by group led me to just stick everything in loops, but this model is hopelessly slow (takes around an hour to train on only 5000 rows of data). I compound this by also making predictions for a test set in the generated quantities section, which again is done with a loop. This section I would really like to try and vectorize, but am unsure how to achieve this given that I’m making predictions using normal_rng() and I have that complexity in producing mu_test from those terms that differ by group, and the sigma here differs by group.

Any help folks can offer would be greatly appreciated. I’m familiar with vectorizing simple models where intercepts and beta parameters are fixed, but not with mixed-models.

``````data {
int<lower=0> N; // number of samples
int<lower=0> K; // number of predictors
int<lower=1> J; // number of groups

vector[N] y; // response variable
matrix[N, K] X; // predictor matrix
int x_group[N]; // the group ids for each sample

vector[N] y_test; // response variable testing set
matrix[N, K] X_test; // predictor matrix testing set
int x_group_test[N]; // the group ids for each sample testing set

}
parameters {
vector[J] alpha; // vector of group level intercepts
vector[K] beta[J]; // coefficients for predictors, one for each J groups
vector<lower=0>[J] sigma;
}
transformed parameters {
vector[N] mu;

for (i in 1:N){
mu[i] = alpha[x_group[i]] + X[i] * (beta[x_group[i]]);
}

}
model {
// priors
for (j in 1:J){
sigma[j] ~ exponential(1);
}

for (i in 1:N){
y[i] ~ normal(mu[i], sigma[x_group[i]]);
}
}
generated quantities {
vector[N] log_lik;
vector[N] y_hat;
vector[N] resid;

vector[N] mu_test;
vector[N] y_pred;
vector[N] squared_error;
real MSE;

vector[N] abs_per_error;
vector MAPE_denom;
real MAPE;

for (i in 1:N) {
log_lik[i] = normal_lpdf(y[i] | mu[i], sigma[x_group[i]]);
y_hat[i] = normal_rng(mu[i], sigma[x_group[i]]);
resid[i] = y[i] - y_hat[i];

// make prediction
mu_test[i] = alpha[x_group_test[i]] + X_test[i] * beta[x_group_test[i]];
y_pred[i] = normal_rng(mu_test[i], sigma[x_group_test[i]]);

// calc squared error
squared_error[i] = square(y_test[i] - y_pred[i]);

// calc Absolute Percentage Error
MAPE_denom = 0.00001;  // need to set to account for 0s
MAPE_denom = abs(y_test[i]);
abs_per_error[i] = abs(y_test[i] - y_pred[i]) / max(MAPE_denom);
}
MSE = mean(squared_error);
MAPE = mean(abs_per_error);

}

``````

The likelihood can simplify to just:

``````y ~ normal(mu, sigma[x_group]);
``````

It’s not immediately obvious looking at this what is slow though. It might be the improper priors on `alpha` and `beta` or correlations in `X` that are causing problems.

As this model is written, the groups are influencing each other anyway (presumably you’ll want this eventually, but as it is written now I think they’re independent). You could start investigating this with just the data from one group. So add priors and see if that helps, and check out 1.2 The QR reparameterization | Stan User’s Guide if your covariates are correlated much.

There are some new profiling tools to see what calculations are taking a long time too Profiling Stan programs with CmdStanR • cmdstanr (so you can check how much time is spent building `mu` vs evaluating the likelihood).

1 Like

Thanks for taking the time to write a response, Ben. I think you also helped me on my last post, and I think the hard work you do on this forum is invaluable.

I had a crack at implementing the QR reparameterization and I think it helped quite a bit (although I confess I will need to read up a little about WHY it helped!). I was able to vectorize the likelihood as you suggested too, and with the QR code on the Stan Guide also avoid the loop for constructing mu.

The last piece in the puzzle for me is the loop in the generated quantities, to produce log_lik and those others. I have taken the MSE and MAPE quantities out. I now only sample the posterior of y_pred in stan and calculate the other quantities (squared error, absolute percentage error, MSE and MAPE) in a vectorized manner with numpy. I do still need the loop for the calls to normal_lpdf() and normal_rng() though. Is there a way to vectorize these?

Thanks again!

1 Like