Increasing Stan efficiency by vectorizing for loop

Hi all,

I have the following Stan model and I’m wondering if there’s a way to speed up the following Stan for loop. I’d like to find a vectorized solution as computational efficiency is currently a major bottleneck and I know for loops are inefficient. Alternatively, if there are other things I can do to increase efficiency, I’d love to hear about it!

// The input data is the module-year mu for 
data {
  int<lower=0> N;
  vector[N] alpha_hat_jdy;
  int g[N]; //group assignments
  int<lower=0> K; //number of  groups
  vector<lower = 0>[N] sigma_alpha;
}

// 
parameters {
  vector<lower = 0>[N] alpha_jdy;
  vector<lower = 0>[K] k_my;
  vector<lower = 0>[K] theta_my;
}

//
model {
  for (n in 1:N) {
   alpha_jdy[n] ~ gamma(k_my[g[n]],theta_my[g[n]]);
   alpha_hat_jdy[n] ~ normal(-alpha_jdy[n], sigma_alpha[n]); 
  }
}

Aside: does it make a difference (result-wise, not computationally) if the current for loop is separated into two sequential loops like this?

for (n in 1:N) {
   alpha_jdy[n] ~ gamma(k_my[g[n]],theta_my[g[n]]);
}
for (n in 1:N) {
   alpha_hat_jdy[n] ~ normal(-alpha_jdy[n], sigma_alpha[n]); 
}

You should be able to write

model {
   alpha_jdy ~ gamma(k_my[g], theta_my[g]);
   alpha_hat_jdy ~ normal(-alpha_jdy, sigma_alpha); 
}

for a fully vectorized implementation. Note, it may be a good idea to declare g with an upper bound of K to prevent a data instantiation which would lead to the indexing failing.

Got it, thank you! What if I wanted to vectorize the following?

for (n in 1:N) {
   alpha_jdy[n] ~ normal(mu_my[g[n]],sigma_mu[g[n]]) T[, 0];
}

I’ve run into issues with specifying it as

alpha_jdy ~ normal(mu_my, sigma_mu) T[, 0];

because multivariate distributions can’t be truncated.

I believe it is not possible to use vectorized truncations (see Vectorized truncation · Issue #788 · stan-dev/stanc3 · GitHub)

Hi Brian,

If I am not mistaken, from the link you sent, although there is not an implementation of vectorized truncation, it is still possible to write an equivalent version that’s much faster than using a for loop. Is my understanding correct?

That’s right, you’ll just need to specify the truncation statement manually, rather than using the T[] syntax.

The truncation syntax simply adds an adjustment to the log-likelihood using the distribution LCDF/LCCDF.

So the looped sampling statement:

for (n in 1:N) {
   alpha_jdy[n] ~ normal(mu_my[g[n]],sigma_mu[g[n]]) T[, 0];
}

Is equivalent to:

for (n in 1:N) {
  target += normal_lpdf(alpha_jdy[n] | mu_my[g[n]],sigma_mu[g[n]]);
  if (alpha_jdy[n] > 0) {
    target += negative_infinity();
  } else {
    target += -normal_lcdf(0 | mu_my[g[n]],sigma_mu[g[n]]);
  }
}

To vectorise this, you’ll need an array of indexes for the values of alpha_jdy that don’t need to be truncated, as well as the count of observations that do and those that don’t. It’s easiest to create these externally in something like R:

g_not_trunc <- which(alpha_hat_jdy <= 0)
N_not_trunc <- length(g_not_trunc)
N_trunc <- length(alpha_hat_jdy) - N_not_trunc

With your data block then looking like:

data {
  int<lower=0> N;
  int<lower=0> N_trunc;
  int<lower=0> N_not_trunc;
  vector[N] alpha_hat_jdy;
  int g[N]; //group assignments
  int g_not_trunc[N_not_trunc];
  int<lower=0> K; //number of  groups
  vector<lower = 0>[N] sigma_alpha;
}

transformed data {
  // Create in transformed data so that it is only constructed once
  real trunc_adj = N_trunc * negative_infinity();
}

And then you can vectorise the truncated likelihood as:

target += normal_lpdf(alpha_jdy | mu_my[g],sigma_mu[g]);
target += trunc_adj;
target += -normal_lcdf(0 | mu_my[g[g_not_trunc]],sigma_mu[g[g_not_trunc]]);
2 Likes

When I work with the non-vectorized implementations, I’m able to obtain results but when I implement the vectorized version, I get

SAMPLING FOR MODEL 'normal_normal' NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
error occurred during calling the sampler; sampling not done

I’m not sure why this error is occurring given that the same process worked when I used the modified for loop below with target().

for (n in 1:N) {
  target += normal_lpdf(alpha_jdy[n] | mu_my[g[n]],sigma_mu[g[n]]);
  if (alpha_jdy[n] > 0) {
    target += negative_infinity();
  } else {
    target += -normal_lcdf(0 | mu_my[g[n]],sigma_mu[g[n]]);
  }
}

Here is my final code - it’s pretty similar to the solution you originally provided. Thanks for all your help!

// The input data is the module-year mu for 
data {
  int<lower=0> N;
  vector[N] alpha_hat_jdy;
  vector<lower = 0>[N] sigma_alpha;

  int<lower=0> K; //number of module-year groups
  vector[K] mu_my;
  vector<lower = 0>[K] sigma_mu;
  int g[N];
  
  int<lower=0> N_trunc;
  int<lower=0> N_not_trunc;
  int g_not_trunc[N_not_trunc];
}

transformed data {
  // Create in transformed data so that it is only constructed once
  real trunc_adj = N_trunc * negative_infinity();
}

parameters {
  vector<upper = 0>[N] alpha_jdy;
}

model {
  target += normal_lpdf(alpha_jdy | mu_my[g],sigma_mu[g]);
  target += trunc_adj;
  target += -normal_lcdf(0 | mu_my[g[g_not_trunc]],sigma_mu[g[g_not_trunc]]);
}