Optimizing Zero-Inflated Poisson GLM for vectorization/GPU

eroubenoff · April 13, 2023, 7:30pm

I am running a rather large ZIP GLM model indexed over time (19 years) and location (approx 5000), and am wondering if there are ways that the likelihood can be re-written to take advantage of vectorization and/or speed-ups with OpenCL GPU support.

My model is similar to the example from the docs (below is a simplification of my code, which also includes spatial effects):

data {
  int<lower=0> N; // Number of locations
  int<lower=0> T; // Number of years
  int<lower=0> y[N,T]; // Outcome
  real x[N]; // Covariate
}
parameters {
  real alpha_theta; // Intercept, bernoulli
  real alpha_lambda // intercept, poisson
  real beta_theta; // Coefficient, bernoulli
  real beta_lambda; // Coefficient, poisson
  real sigma_theta[T,N]; // Error term, bernoulli
  real sigma_lambda[T,N]; // Error term, poisson
}
transformed parameters{
  real theta[T,N]; // Bernoulli param
  real lambda[T,N]; // Poisson param
  for (t in 1:T) {
    // GLM eqns
    theta[t] = alpha_theta + beta_theta * x[t] + sigma_theta[t];
    lambda[t] = alpha_lambda + beta_lambda * x[t] + sigma_lambda[t];
  }
}
model {
  alpha_theta ~ std_normal();
  alpha_lambda ~ std_normal();
  beta_theta ~ std_normal(); 
  beta_lambda ~ std_normal(); 
  for (n in 1:T) {
    sigma_theta[t] ~ std_normal();
    sigma_theta[t] ~ std_normal();
    for (t in 1:N){
      if (y[t,n] == 0)
        target += log_sum_exp(bernoulli_logit_lpmf(1 | theta[t,n]),
                            bernoulli_logit_lpmf(0 | theta[t,n])
                              + poisson_log_lpmf(y[n] | lambda[t,n]));
      else
        target += bernoulli_logit_lpmf(0 | theta[t,n])
                  + poisson_log_lpmf(y[t,n] | lambda[t,n]);
    }
  }
}

The documentation gives a neat example of pre-processing all of the bernoulli parts for the 0-count observations, to the tune of:

   target
     += N_zero
          * log_sum_exp(bernoulli_lpmf(1 | theta),
                        bernoulli_lpmf(0 | theta)
                          + poisson_lpmf(0 | lambda));
   target += N_nonzero * bernoulli_lpmf(0 | theta);
   target += poisson_lpmf(y_nonzero | lambda);

How could my double-indexed example be extended to take advantage of the vectorization in the second snipped? I guess the hard part is getting the right indices of theta, lambda, and y to match up. Maybe would be ideal if I could use the bernoulli_glm_lpmf() family of functions. Would appreciate any help here… thanks!

eroubenoff · April 14, 2023, 4:01pm

Well, I now see the following warning against doing exactly this:

## 5.4 Vectorizing mixtures

andre.pfeuffer · April 15, 2023, 12:47am

Before considering OpenCL I’d optimize the model.

if(y[t,n] == 0)

can be replaced by array of int Indices. Then we can get rid of the loop too.

Both are independent. Thus you can write them with
poisson_log_glm and bernoulli_logit_glm statements from which
OpenCL variants exists.

eroubenoff:

        target += log_sum_exp(bernoulli_logit_lpmf(1 | theta[t,n]),
                            bernoulli_logit_lpmf(0 | theta[t,n])
                              + poisson_log_lpmf(y[n] | lambda[t,n]));

This statement could be vectorized by expanding the function of the distribution instead of
their function calls and if needed applying the log-sum-exp max trick, see LogSumExp - Wikipedia
for details. Depending on the number of zero observations in your data, which is usually
less than the rest, this may not worth the optimization.

eroubenoff · April 15, 2023, 5:32pm

Unfortunately we have many zeros in the dataset so both will need to be optimized. Here is what I came up with:

data {
  int<lower=0> N; // Number of locations
  int<lower=0> T; // Number of years
  int<lower=0> y[N,T]; // Outcome
  real x[N]; // Covariate
}
transformed data {
  // indices of zero counts
  int zero_idx[T,N];
  // Max number of zero counts
  int zero_max[T];
  // indices of nonzero counts
  int nonzero_idx[T,N];
  // max number of nonzero counts
  int nonzero_max[T];
  
  // Loop through the y data and tally zeros and nonzeros appropriately
  for (t in 1:T) {
    zero_max[t] = 0;
    nonzero_max[t] = 0;
    for (n in 1:N){
      if (y[t,n] == 0) {
        zero_max[t] += 1;
        zero_idx[t, zero_max[t]] = n;
      }
      else {
        nonzero_max[t] += 1;
        nonzero_idx[t, nonzero_max[t]] = n;
      }
    }
  }
}
parameters {
  real alpha_theta; // Intercept, bernoulli
  real alpha_lambda // intercept, poisson
  real beta_theta; // Coefficient, bernoulli
  real beta_lambda; // Coefficient, poisson
  real sigma_theta[T,N]; // Error term, bernoulli
  real sigma_lambda[T,N]; // Error term, poisson
}
transformed parameters{
  real theta[T,N]; // Bernoulli param
  real lambda[T,N]; // Poisson param
  for (t in 1:T) {
    // GLM eqns
    theta[t] = alpha_theta + beta_theta * x[t] + sigma_theta[t];
    lambda[t] = alpha_lambda + beta_lambda * x[t] + sigma_lambda[t];
  }
}
model {
  vector[N] theta_inv_logit; 
  vector[N] lambda_exp;

  alpha_theta ~ std_normal();
  alpha_lambda ~ std_normal();
  beta_theta ~ std_normal(); 
  beta_lambda ~ std_normal(); 
  for (n in 1:T) {
    sigma_theta[t] ~ std_normal();
    sigma_theta[t] ~ std_normal();

    theta_inv_logit = inv_logit(theta[t]);
    lambda_exp = exp(lambda[t]);
    
    // Zeros
    target += sum(log(
           theta_inv_logit[zero_idx[t, 1:zero_max[t]]] + 
        (1-theta_inv_logit[zero_idx[t, 1:zero_max[t]]]) .* 
        exp(-lambda_exp[zero_idx[t, 1:zero_max[t]]])
      ));
    
    // Nonzeros
    target += bernoulli_lpmf(
                rep_array(0, nonzero_max[t]) | 
                theta_inv_logit[nonzero_idx[t,1:nonzero_max[t]]]
          ) + poisson_lpmf(
                y[t, nonzero_idx[t, 1:nonzero_max[t]]] | 
                lambda_exp[nonzero_idx[t, 1:nonzero_max[t]]]
          );
  }
}

andre.pfeuffer · April 15, 2023, 10:31pm

Before doing further optimizations, some parts needs some rework.

However

doesn’t match. x is defined by N the loops uses x[t], with t = 1,…,T.

This assigns every observation [T,N] a parameter. The error is usually covered with the
distribution function.

eroubenoff · April 16, 2023, 8:07pm

Yes, you are right on both accounts. I did a sloppy job of simplifying the code from my true script, where these issues aren’t present. I’ll correct the above example in case anyone comes across it in the future.

Topic		Replies	Views
Vectorizing a Zero Inflated Poisson Model Modeling specification , performance	2	1644	September 21, 2017
Efficiency of constrained Poisson regression Modeling techniques , fitting-issues	1	395	November 9, 2022
GPU GLMs: float or double Developers math	41	2162	July 30, 2019
Parallelising matrix multiplications Modeling	10	539	May 4, 2023
Optimizing a zero-inflated model with multiple periods Modeling performance	2	364	July 31, 2020

Optimizing Zero-Inflated Poisson GLM for vectorization/GPU

Related topics