# Speeding up besag + QR decomposition model

Dear community,
I’m still learning stan and I have to implement a modified version of the besag model with the QR decomposition as well. I did my best to write this code, but it is painfully slow (~60 mins for 20K iterations). Are there any performance improvements I can implement to speed up this code?

N: 2177
p: 16
Q: (2177, 161)

``````data {
int<lower = 1> N;
int<lower = 1> K;
int<lower = 1> p;
matrix[N, K] x;
int<lower = 0> y[N];
matrix[N, p] Q;
vector[p] lambdas;
}
parameters {
real beta0;            // intercept
vector[K] betas;      // coefficients
vector[p] phi;         // spatial effects
real<lower = 0, upper=100> phi2_reciprocal;
real<lower = 0, upper=50> alpha;
real<lower=0> sigma;
}

transformed parameters {
// Var(X) = mu + mu^2 / phi
// phi -> infty no dispersion, phi -> 0 is full dispersion
// this is 1 / phi, so that phi_reciprocal = 0 is no dispersion
real phi2;
vector[p] lambdas2;
real norm_factor = 0.;

// pow is not a vector operation
for (i in 1:p) {
lambdas2[i] = pow(lambdas[i], alpha);
// check overflow
if (lambdas2[i] > 10000000)
lambdas2[i] = 10000000.;
}

norm_factor = sum(lambdas)/sum(lambdas2);
lambdas2 = norm_factor * lambdas2;

phi2 = 1. / (phi2_reciprocal + 0.001);
}

model {
beta0 ~ normal(0.0, 1.0);
betas ~ normal(0.0, 1.0);
phi2_reciprocal ~ cauchy(0., 5.);
alpha ~ cauchy(0., 1.);
sigma ~ normal(0.0, 1.0);
phi ~ normal(0, sigma * lambdas2);
y ~ neg_binomial_2_log(beta0 + x * betas + Q*phi, phi2);
// soft sum-to-zero constraint on phi)
sum(phi) ~ normal(0, 0.01 * N);  // equivalent to mean(phi) ~ normal(0, 0.01)
}

generated quantities {
vector[N] log_lik;
int<lower=0> y_pred[N];
vector[N] varF = x * betas;
vector[N] varR = Q*phi;
real varI = beta0;
vector[N] eta = varI + varF + varR;
vector[N] mu = exp(eta) + 0.00001;
vector[N] mu2 = exp(varI + varF) + 0.00001;

for (i in 1:N) {
// in the first iterations it fails, so let's check it.
if(mu[i] > 10000 || mu[i] < 0) {
y_pred[i] = 99999;
log_lik[i] = neg_binomial_2_lpmf(y[i] | 99999, phi2);
} else {
y_pred[i] =  neg_binomial_2_rng(mu2[i], phi2);
log_lik[i] = neg_binomial_2_lpmf(y[i] | mu[i], phi2);
}

}
}
``````

thxx

1 Like

It would probably help to have a more informative prior on `alpha`, e.g. `alpha ~ gamma(2,2)`.

You could use the softmax function to vectorize that `pow`. That can also avoid the overflow check (the check may cause problems for the sampler).

``````transformed data {
vector[p] log_lambdas = log(lambdas);
real sum_lambdas = sum(lambdas);
}
``````
``````transformed parameters {
vector[p] lambdas2 = sum_lambdas*softmax(alpha*log_lambdas);
...
}
``````

Hope that helps.

2 Likes

More on the brute force side, you could also use map_rect to parallelise your model which will make it drastically faster, or you can enable the GPU version of the QR decomposition function if it’s not enabled by default.

Wow! I did not think about it!!! thx
The code improved by ~30% :))

Unfortunately I’m on a machine without GPU :( I’ll have a look at map_rect. Thanks!

It depends on the model but it should be possible to get “embarrassingly parallel” performance which improves linearly with the number of CPUs for a time. If you’ve got just 30% so far there’s likely more to be had.