Optimizing Zero-Inflated Poisson GLM for vectorization/GPU

I am running a rather large ZIP GLM model indexed over time (19 years) and location (approx 5000), and am wondering if there are ways that the likelihood can be re-written to take advantage of vectorization and/or speed-ups with OpenCL GPU support.

My model is similar to the example from the docs (below is a simplification of my code, which also includes spatial effects):

data {
  int<lower=0> N; // Number of locations
  int<lower=0> T; // Number of years
  int<lower=0> y[N,T]; // Outcome
  real x[N]; // Covariate
}
parameters {
  real alpha_theta; // Intercept, bernoulli
  real alpha_lambda // intercept, poisson
  real beta_theta; // Coefficient, bernoulli
  real beta_lambda; // Coefficient, poisson
  real sigma_theta[T,N]; // Error term, bernoulli
  real sigma_lambda[T,N]; // Error term, poisson
}
transformed parameters{
  real theta[T,N]; // Bernoulli param
  real lambda[T,N]; // Poisson param
  for (t in 1:T) {
    // GLM eqns
    theta[t] = alpha_theta + beta_theta * x[t] + sigma_theta[t];
    lambda[t] = alpha_lambda + beta_lambda * x[t] + sigma_lambda[t];
  }
}
model {
  alpha_theta ~ std_normal();
  alpha_lambda ~ std_normal();
  beta_theta ~ std_normal(); 
  beta_lambda ~ std_normal(); 
  for (n in 1:T) {
    sigma_theta[t] ~ std_normal();
    sigma_theta[t] ~ std_normal();
    for (t in 1:N){
      if (y[t,n] == 0)
        target += log_sum_exp(bernoulli_logit_lpmf(1 | theta[t,n]),
                            bernoulli_logit_lpmf(0 | theta[t,n])
                              + poisson_log_lpmf(y[n] | lambda[t,n]));
      else
        target += bernoulli_logit_lpmf(0 | theta[t,n])
                  + poisson_log_lpmf(y[t,n] | lambda[t,n]);
    }
  }
}

The documentation gives a neat example of pre-processing all of the bernoulli parts for the 0-count observations, to the tune of:

   target
     += N_zero
          * log_sum_exp(bernoulli_lpmf(1 | theta),
                        bernoulli_lpmf(0 | theta)
                          + poisson_lpmf(0 | lambda));
   target += N_nonzero * bernoulli_lpmf(0 | theta);
   target += poisson_lpmf(y_nonzero | lambda);

How could my double-indexed example be extended to take advantage of the vectorization in the second snipped? I guess the hard part is getting the right indices of theta, lambda, and y to match up. Maybe would be ideal if I could use the bernoulli_glm_lpmf() family of functions. Would appreciate any help here… thanks!

1 Like

Well, I now see the following warning against doing exactly this:

## 5.4 Vectorizing mixtures

1 Like

Before considering OpenCL I’d optimize the model.

if(y[t,n] == 0)

can be replaced by array of int Indices. Then we can get rid of the loop too.

Both are independent. Thus you can write them with
poisson_log_glm and bernoulli_logit_glm statements from which
OpenCL variants exists.

This statement could be vectorized by expanding the function of the distribution instead of
their function calls and if needed applying the log-sum-exp max trick, see LogSumExp - Wikipedia
for details. Depending on the number of zero observations in your data, which is usually
less than the rest, this may not worth the optimization.

Unfortunately we have many zeros in the dataset so both will need to be optimized. Here is what I came up with:

data {
  int<lower=0> N; // Number of locations
  int<lower=0> T; // Number of years
  int<lower=0> y[N,T]; // Outcome
  real x[N]; // Covariate
}
transformed data {
  // indices of zero counts
  int zero_idx[T,N];
  // Max number of zero counts
  int zero_max[T];
  // indices of nonzero counts
  int nonzero_idx[T,N];
  // max number of nonzero counts
  int nonzero_max[T];
  
  // Loop through the y data and tally zeros and nonzeros appropriately
  for (t in 1:T) {
    zero_max[t] = 0;
    nonzero_max[t] = 0;
    for (n in 1:N){
      if (y[t,n] == 0) {
        zero_max[t] += 1;
        zero_idx[t, zero_max[t]] = n;
      }
      else {
        nonzero_max[t] += 1;
        nonzero_idx[t, nonzero_max[t]] = n;
      }
    }
  }
}
parameters {
  real alpha_theta; // Intercept, bernoulli
  real alpha_lambda // intercept, poisson
  real beta_theta; // Coefficient, bernoulli
  real beta_lambda; // Coefficient, poisson
  real sigma_theta[T,N]; // Error term, bernoulli
  real sigma_lambda[T,N]; // Error term, poisson
}
transformed parameters{
  real theta[T,N]; // Bernoulli param
  real lambda[T,N]; // Poisson param
  for (t in 1:T) {
    // GLM eqns
    theta[t] = alpha_theta + beta_theta * x[t] + sigma_theta[t];
    lambda[t] = alpha_lambda + beta_lambda * x[t] + sigma_lambda[t];
  }
}
model {
  vector[N] theta_inv_logit; 
  vector[N] lambda_exp;

  alpha_theta ~ std_normal();
  alpha_lambda ~ std_normal();
  beta_theta ~ std_normal(); 
  beta_lambda ~ std_normal(); 
  for (n in 1:T) {
    sigma_theta[t] ~ std_normal();
    sigma_theta[t] ~ std_normal();

    theta_inv_logit = inv_logit(theta[t]);
    lambda_exp = exp(lambda[t]);
    
    // Zeros
    target += sum(log(
           theta_inv_logit[zero_idx[t, 1:zero_max[t]]] + 
        (1-theta_inv_logit[zero_idx[t, 1:zero_max[t]]]) .* 
        exp(-lambda_exp[zero_idx[t, 1:zero_max[t]]])
      ));
    
    // Nonzeros
    target += bernoulli_lpmf(
                rep_array(0, nonzero_max[t]) | 
                theta_inv_logit[nonzero_idx[t,1:nonzero_max[t]]]
          ) + poisson_lpmf(
                y[t, nonzero_idx[t, 1:nonzero_max[t]]] | 
                lambda_exp[nonzero_idx[t, 1:nonzero_max[t]]]
          );
  }
}

Before doing further optimizations, some parts needs some rework.

However

doesn’t match. x is defined by N the loops uses x[t], with t = 1,…,T.

This assigns every observation [T,N] a parameter. The error is usually covered with the
distribution function.

1 Like

Yes, you are right on both accounts. I did a sloppy job of simplifying the code from my true script, where these issues aren’t present. I’ll correct the above example in case anyone comes across it in the future.

1 Like