Vectorized use of neg_binomial_2_lpmf

I’m having some difficulty creating a media mix model with a negative binomial distribution used to distribute the effect of marking spend on a given day to subsequent days. The following code with a double loop in the function adstock_nbinom works as expected, producing good estimates, but it is very slow. It applies neg_binomial_2_lpmf for every daily spend to every following day up to adstock_steps days in a time series.

functions {
  vector adstock_nbinom(vector X, int n_timesteps, array[] int timesteps, real mean_shift, real concentration) {
    vector[n_timesteps] adstocked_X = rep_vector(0, n_timesteps);
    int adstock_steps = 12; // This should capture >=98% of spend effect for mean_shift<=5
    for (t_spend in 1 : n_timesteps) {
      for (t_effect in t_spend : t_spend+adstock_steps) {
        if (t_effect > n_timesteps) break;
        adstocked_X[t_effect] += X[t_spend] 
        * exp(neg_binomial_2_lpmf(t_effect - t_spend | mean_shift, concentration));
      }
    }
    return (adstocked_X);
  }

  vector saturation_hill(vector X, int n_timesteps, real beta_hill, real kappa_hill) {
    vector[n_timesteps] saturation; 
    for (t in 1:n_timesteps) {
      saturation[t] = beta_hill * kappa_hill / (X[t] + kappa_hill);
    }
    return (saturation);
  }
}

data {
  int n_timesteps;
  int n_channels;
  int n_covariates;
  matrix[n_timesteps, n_channels] X;
  matrix[n_timesteps, n_covariates] covariates;
  vector[n_timesteps] y;
  vector[n_channels] betas_hill_prior;
}

transformed data {
  int timesteps[n_timesteps] = rep_array(0, n_timesteps);

  for (t in 2:n_timesteps){
    timesteps[t] = timesteps[t-1] + 1;
  }
}

parameters {
  real<lower=0> intercept;
  real trend;
  real<lower=0> sigma;
  vector[n_covariates] betas_cov;
  vector<lower=0>[n_channels] mean_shift;
  vector<lower=0>[n_channels] concentration;
  vector<lower=0>[n_channels] betas_hill;
  vector<lower=0>[n_channels] kappas_hill;
[neg_binomial_2_lpmf_test_data.txt|attachment](upload://d8dOfevjMCVWL5xNyr0i12xP6k3.txt) (38.6 KB)

}

transformed parameters {
  vector[n_timesteps] mu;
  vector[n_timesteps] comp_trend = rep_vector(0, n_timesteps);
  vector[n_timesteps] comp_intercept;
  vector[n_timesteps] comp_covariates = rep_vector(0, n_timesteps);
  vector[n_timesteps] comp_satstocked_X = rep_vector(0, n_timesteps);
  matrix[n_timesteps, n_channels] adstocked_X;
  matrix[n_timesteps, n_channels] saturation;

  // Intercept component
  comp_intercept = rep_vector(intercept, n_timesteps);

  // Trend and covariates components
  for(t in 1:n_timesteps) {
    if (t == 1) {
      comp_trend[t] = trend;
    } else {
      comp_trend[t] = comp_trend[t-1] + trend;
    }
    for (cov in 1:n_covariates) {
      comp_covariates[t] += betas_cov[cov] * covariates[t, cov];
    }
  }

  // Adstock and saturation component
  for(c in 1:n_channels) {
    // Adstock & saturation do not commute (order matters)
    // Adstock transformation first if: the media X in each time period is 
    // relatively small compared to the cumulative X across multiple time periods.
    adstocked_X[, c] = adstock_nbinom(X[, c], n_timesteps, timesteps, mean_shift[c], concentration[c]);
    saturation[, c] = saturation_hill(adstocked_X[, c], n_timesteps, betas_hill[c], kappas_hill[c]);
    comp_satstocked_X += saturation[, c] .* adstocked_X[, c];
  }

  // Combine all components
  mu = comp_intercept + comp_trend + comp_covariates + comp_satstocked_X;
}

model {
  // Priors
  intercept ~ normal(0, 1);
  trend ~ normal(0, 0.001); // Small spread since this is accumulates over timesteps
  betas_cov ~ normal(0, 1); // Covariates contribution
  mean_shift ~ normal(1.5, 1); // Delay of adstocked spend
  concentration ~ normal(10, 5); // Concentration of adstocked spend
  betas_hill ~ normal(betas_hill_prior, 1); // Base level efficiency of channel
  kappas_hill ~ normal(0, 1); // Half saturation point (when channel efficiency starts to decline)
  sigma ~ exponential(1);

  // Likelihood
  y ~ normal(mu, sigma);
} 

generated quantities {
  // Posterior predictive observations
  vector[n_timesteps] yhat;
  for (t in 1:n_timesteps) {
    yhat[t] = normal_rng(mu[t], sigma);
  }
}

I’ve tried to take advantage of the vectorized nature of neg_binomial_2_lpmf by replacing the adstock_nbinom with the following function, but it displays some strange behavior, which tells me I’m probably doing something wrong. Specifically, the estimated parameters are quite different from the parameters used to generate the synthetic data I’m testing. Additionally, if I set adstock_steps > 3, the model fails to converge. adstock_steps controls the length of the integer array input to the n parameter for neg_binomial_2_lpmf.

  vector adstock_nbinom(vector X, int n_timesteps, array[] int timesteps, real mean_shift) {//, real concentration) {
    vector[n_timesteps] adstocked_X = rep_vector(0, n_timesteps);
    int adstock_steps = 4; // why does this behave strangely at 4 but not 3?
    for (t_spend in 1 : (n_timesteps - (adstock_steps - 1))) {
      if (t_spend <= (n_timesteps-(adstock_steps-1))) {
        adstocked_X[t_spend:t_spend+(adstock_steps-1)] += X[t_spend] * exp(neg_binomial_2_lpmf(timesteps[:adstock_steps] | mean_shift, 10));
      } else {
        adstocked_X[t_spend:n_timesteps] += X[t_spend] * exp(neg_binomial_2_lpmf(timesteps[:n_timesteps-t_spend+1] | mean_shift, 10));
      }
    }
    return (adstocked_X);
  }

I’ve attached the synthetic data used to train this model. It’s JSON data, but held as a .txt since JSON is not a supported file type on this website.

Any help would be greatly appreciated. I’m fairly new to probabilistic programming so advice on the general approach here would also be helpful.

After more digging, it looks like neg_binomial_2_lpmf returns a single real regardless of the size of the of the ints y parameter, which seems a little unintuitive. I’m not sure I understand the reasoning there and the Stan docs are unfortunately not very helpful! The same applies to poisson_lpmf which I tested as a substitute.

Anyway, here is a much faster version of the working function, which avoids the nested loop:

functions {
  vector adstock_nbinom(vector X, int n_timesteps, real mean_shift, real concentration) {
    int adstock_steps = 12; // This should capture >=98% of spend effect for mean_shift<=5
    vector[n_timesteps+adstock_steps] adstocked_X = rep_vector(0, n_timesteps+adstock_steps);
    row_vector[adstock_steps] nbinom;
    matrix[n_timesteps, adstock_steps] effect;

    // Create the negative binomial PMF to represent the adstocked effect of spend
    for (t_effect in 1:adstock_steps) {
      nbinom[t_effect] = exp(neg_binomial_2_lpmf(t_effect-1 | mean_shift, concentration));
    }
    
    // Distribute the spend effect across lagging time periods
    effect = rep_matrix(nbinom, n_timesteps) .* rep_matrix(X, adstock_steps);
    for (t_effect in 1:adstock_steps) {
      adstocked_X[t_effect:n_timesteps+t_effect-1] += effect[, t_effect];
    }
    return (adstocked_X[:n_timesteps]);
  }
}