Speeding up besag + QR decomposition model

Dear community,
I’m still learning stan and I have to implement a modified version of the besag model with the QR decomposition as well. I did my best to write this code, but it is painfully slow (~60 mins for 20K iterations). Are there any performance improvements I can implement to speed up this code?

N: 2177
p: 16
Q: (2177, 161)

data {
  int<lower = 1> N;
  int<lower = 1> K;
  int<lower = 1> p;
  matrix[N, K] x;
  int<lower = 0> y[N];
  matrix[N, p] Q;
  vector[p] lambdas;
}
parameters {
  real beta0;            // intercept
  vector[K] betas;      // coefficients
  vector[p] phi;         // spatial effects
  real<lower = 0, upper=100> phi2_reciprocal;
  real<lower = 0, upper=50> alpha;
  real<lower=0> sigma;
}

transformed parameters {
  // Var(X) = mu + mu^2 / phi
  // phi -> infty no dispersion, phi -> 0 is full dispersion
  // this is 1 / phi, so that phi_reciprocal = 0 is no dispersion
  real phi2;
  vector[p] lambdas2;
  real norm_factor = 0.;

  // pow is not a vector operation
  for (i in 1:p) {
    lambdas2[i] = pow(lambdas[i], alpha);
    // check overflow
    if (lambdas2[i] > 10000000)
        lambdas2[i] = 10000000.;
  }

  norm_factor = sum(lambdas)/sum(lambdas2);
  lambdas2 = norm_factor * lambdas2;

  phi2 = 1. / (phi2_reciprocal + 0.001);
}


model {
  beta0 ~ normal(0.0, 1.0);
  betas ~ normal(0.0, 1.0);
  phi2_reciprocal ~ cauchy(0., 5.);
  alpha ~ cauchy(0., 1.);
  sigma ~ normal(0.0, 1.0);
  phi ~ normal(0, sigma * lambdas2);
  y ~ neg_binomial_2_log(beta0 + x * betas + Q*phi, phi2);
  // soft sum-to-zero constraint on phi)
  sum(phi) ~ normal(0, 0.01 * N);  // equivalent to mean(phi) ~ normal(0, 0.01)
}

generated quantities {
    vector[N] log_lik;
    int<lower=0> y_pred[N];
    vector[N] varF = x * betas;
    vector[N] varR = Q*phi;
    real varI = beta0;
    vector[N] eta = varI + varF + varR;
    vector[N] mu = exp(eta) + 0.00001;
    vector[N] mu2 = exp(varI + varF) + 0.00001;

    for (i in 1:N) {
        // in the first iterations it fails, so let's check it.
        if(mu[i] > 10000 || mu[i] < 0) {
            y_pred[i] = 99999;
            log_lik[i] = neg_binomial_2_lpmf(y[i] | 99999, phi2);
        } else {
            y_pred[i] =  neg_binomial_2_rng(mu2[i], phi2);
            log_lik[i] = neg_binomial_2_lpmf(y[i] | mu[i], phi2);
        }

    }
}

thxx

1 Like

It would probably help to have a more informative prior on alpha, e.g. alpha ~ gamma(2,2).

You could use the softmax function to vectorize that pow. That can also avoid the overflow check (the check may cause problems for the sampler).

transformed data {
  vector[p] log_lambdas = log(lambdas);
  real sum_lambdas = sum(lambdas);
}
transformed parameters {
  vector[p] lambdas2 = sum_lambdas*softmax(alpha*log_lambdas);
  ...
}

Hope that helps.

2 Likes

More on the brute force side, you could also use map_rect to parallelise your model which will make it drastically faster, or you can enable the GPU version of the QR decomposition function if it’s not enabled by default.

Wow! I did not think about it!!! thx
The code improved by ~30% :))

Unfortunately I’m on a machine without GPU :( I’ll have a look at map_rect. Thanks!

It depends on the model but it should be possible to get “embarrassingly parallel” performance which improves linearly with the number of CPUs for a time. If you’ve got just 30% so far there’s likely more to be had.