Vectorizing with if else

Hi,

I am relatively new to Stan. I have written a Stan program to be called from R. The program works and does what expected. However, it is very slow and I am looking for ways to improve it. An obvious one is to vectorize two loops, which I believe are the main bottleneck.

The main parts of the program is to fit a bell-shaped curve (response), calculate a quantity known as the loglam, which is then used to calculate the probability of observing or not a species. There are two sets of parameters, one for response (demographic model) and one for the logistic regression (observation model). I need to allow the bell-shaped curve to be asymmetric, which I achieve by introducing two parameters (sigl and sigr) to control the shape of the curve to the left and to the right of 0. To allow this, however, I had to introduce an if/else statement to use the correct sigl or sigr for that specific data point (see // HOW TO HANDLE THIS? in the Stan code below).

data {
  int N; // observations, i.e. number of locations
  int M; // length of time series
  array[N] int<lower=0,upper=1> occ; // observed = 1 or not-observed = 0
  array[N, M] real ts; // time series
}

parameters {
  // demographic model
  real mu;
  real<lower=0> sigl;
  real<lower=0> sigr;
  
  // observation model
  real c;
  real<lower=0,upper=1> pd;
}

model{
  // auxiliary variables
  vector[M] response;
  vector[N] loglam;
  real u; 
  real v; 

  // priors - all weakly informative
  mu ~ normal(0, 10);
  sigl ~ exponential(1);
  sigr ~ exponential(1);
  c ~ normal(0, 10);
  pd ~ uniform(0, 1);

  // TWO INEFFICIENT NESTED LOOPS
  // likelihood
  for(i in 1:N){                      // loop over locations
    for(j in 1:M){                    // loop over time
      u = ts[i, j] - mu;
      if (u < 0) {                    // HOW TO HANDLE THIS?
        v = ( u / sigl )^2; 
      } else { 
        v = ( u / sigr )^2; 
      }
      response[j] = -0.5 * v;
    }
    loglam[i] = mean(response);
  }
  occ ~ bernoulli(pd * inv_logit(loglam - c)); 
}

// I want to use `loo()` for model selection
generated quantities{
  vector[N] log_lik;{
    // auxiliary variables
    vector[M] response;
    vector[N] loglam;
    real u; 
    real v;
    
    // likelihood
    for(i in 1:N){                      // loop over locations
      for(j in 1:M){                    // loop over time
        u = ts[i, j] - mu;
        if (u < 0) { 
          v = ( u / sigl )^2; 
        } else { 
          v = ( u / sigr )^2; 
        }
        response[j] = -0.5 * v;
      }
      loglam[i] = mean(response);
      log_lik[i] = bernoulli_lpmf(occ[i] | pd * inv_logit(loglam[i] - c)); 
    }
  }
}

I need some help understanding how to achieve vectorization of the above code. In particular, how can I vectorize this code with an if/else in the innermost loop? Note that I also generate the log_lik (generated quantities) because I want to perform model selection using loo(), making the inefficient loops appear twice. From my understanding, this is how to do it; is there a faster, better way?

There’s a bigger problem with the if/else—it leads to a discontinuity in the density as u crosses the 0 threshold. Depending on how strong this is, it may prevent Hamiltonian Monte Carlo from crossing the energy gap. If you fit this, does u mix across 0?

The main thing to think about with Stan isn’t vectorization per se, but eliminating redundant operations and reducing the size of the autodiff expression graph (each node in the graph leads to a virtual function call in the C++ and thus pointer chasing in memory).

The biggest one here is (u / sig1)^2 and (u / sir)^2. Those should be computed once and re-used. Then declare things as matrices and vectors to allow vectorization.

Then the code can look like this (you really want to follow conventions and have matrices be M x N wherever possible, but I’ll follow your convention to avoid confusion):

data {
   ...
  matrix[N, M] ts;
  ...

model {
  matrix[N, M] u = ts - mu;  // more compact autodiff as matrix op
  real neg_response = -0.5 * (u / sigl)^2;  // precompute these two
  real non_neg_response = -0.5 * (u / sigr)^2;
  for (n in 1:N) {
    real total = 0;  // keep running total rather than storage
    for (m in 1:M) {
      total += u[n, m] < 0 ? neg_response : non_neg_response;
     loglam[n] = total / M;
  }
  ....

For the likelihood, it’s unusual to see a (0, 1) constrained parameter used to scale a logistic regression as it is not sensitive to location of scaling. It’s more usual to see something like pd included as another additive effect, so that’d be something like

occ ~ bernoulli_logit(pd + loglam - c);

where the _logit means it takes its argument on the log odds scale. The difference is that now the effect of pd varies in terms of fraction based on loglam - c. If you really want it to only reduce odds, then it needs to be declared with <upper=0>.

Hi Bob,

thank you very much for all the suggestions. I will implement them and report back.

Just to be exactly sure, how can I assess this? Should I generate the u quantity and plot it?

Yes!

1 Like