I am currently implementing the Pareto/GGG Buy Till You Die mixture model described in Platzer and Reutterer (2016). One of the parameters \tau represents the customer lifetime and follows a tricky distribution conditional on whether or not the customer is “alive”, so I opted to marginalize it out of the likelihood described in Equation 8. Integrating out \tau by splitting the likelihood into a t_x < \tau < T case and a T < \tau < \infty case gives the following marginal likelihood function. Note that K contains all of the terms in the original likelihood function that are independent of \tau, S_{\Gamma} is the Gamma CCDF, T and t_x are provided data values at the individual customer level, and k, \lambda, and \mu are parameters contained for each customer.
K * [\int_{t_x}^{T} S_{\Gamma}(\tau - t_x | k, k\lambda) \mu e^{-\mu\tau}d\tau + e^{-\mu T}S_{\Gamma}(T - t_x | k, k\lambda)]
The integral \int_{t_x}^{T} S_{\Gamma}(\tau - t_x | k, k\lambda) \mu e^{-\mu\tau}d\tau in the above expression is not fitting via the integrate_1d
function. The corresponding error is “Chain 1: Exception: integrate: error estimate of integral 2.08696e-07 exceeds the given relative tolerance times norm of integral (in ‘model9d5a44fe393d_ParetoGGG_cleaned_up_for_help’ at line 106)”.
I have tried the following to resolve this:
-
Changing the tolerance. This inevitably results in another error estimate error for a different value.
-
Plotting the integrand and exploring how it changes with different values of k, \lambda, and \mu to understand potential discontinuities. There does not seem to be a predictable way to anticipate them.
-
Playing around with the
xc
argument for the integrand. I kept getting errors about values very close to 0. I suspect I may not be understanding how to use this correctly. -
Changing hyperpriors to be more or less strict.
-
Bypassing
integrate_1d
by using Simpson’s rule to integrate, which is what the BTYDplus package authors did in their Gibbs sampling approach. However, that results in “Chain 1: Rejecting initial value:
Chain 1: Gradient evaluated at the initial value is not finite.
Chain 1: Stan can’t start sampling from this initial value.”
Below is my Stan model, along with a self-contained R script I used to fit it:
// ParetoGGG-cleaned-up-for-help.stan
// Modified from Pareto/NBD in these Collab notebooks
//
// https://colab.research.google.com/drive/1yxUkWZSBJWIoxBAi7MIxvkWKNgr0NXDH
// https://www.briancallander.com/posts/customer_lifetime_value/pareto-nbd.html
functions {
real pggg_unscaled_palive_base_integrand(real x,
real tx,
real k,
real lambda,
real mu) {
real unscaled_log_integrand = gamma_lccdf(x - tx | k, k*lambda) - mu*x;
return exp(unscaled_log_integrand);
}
real pggg_palive_base_integrand(real x,
real tx,
real k,
real lambda,
real mu) {
return mu * pggg_unscaled_palive_base_integrand(x, tx, k, lambda, mu);
}
// Function to integrate MUST have this format.
// https://mc-stan.org/docs/2_21/stan-users-guide/integrate-1d.html
//
// ...however it still results in a syntax error, so I'm using the solution here
// https://discourse.mc-stan.org/t/calling-integrate-1d-reports-parsing-error-parser-expected/19611a
real pggg_palive_integrand(real x, // independent variable to integrate over
real xc, // high-precision version of distance
real[] theta, // parameters (k, lambda, mu)
real[] x_r, // data (real) (tx)
int[] x_i) { // data (integer)
// Integrate this function to get a numerator term in Equation 7
// in Platzer and Reutterer (2016)
real unscaled_sol;
real lower_threshold;
real upper_threshold;
real k = theta[1];
real lambda = theta[2];
real mu = theta[3];
real tx = x_r[1];
real Tcal = x_r[2];
unscaled_sol = pggg_unscaled_palive_base_integrand(x, tx, k, lambda, mu);
return mu * unscaled_sol;
}
}
data {
int<lower=1> N; // number of customers
vector<lower=0>[N] x; // number of recurring events
vector<lower=0>[N] tx; // time of most recent transaction
// Called T in the paper, but Tcal in BTYDplus.
// (We avoid using T because T is also the Stan keyword for a truncated distribution)
vector<lower=0>[N] Tcal; // total time observed for each customer.
vector[N] litt; // Sum of logarithmic transaction times
}
parameters {
// each customer gets THEIR OWN k, lambda, and mu
vector<lower=0>[N] k; // regularity
vector<lower=0>[N] lambda; // transaction rate
vector<lower=0>[N] mu; // dropout rate
// Parameters for distributions of rates k, lambda, and mu
// k ~ Gamma(t, gamma)
real<lower=0> t; // // NOT to be confused with tx or Tcal
real<lower=0> gamma;
// lambda ~ Gamma(r, alpha)
real<lower=0> r;
real<lower=0> alpha;
// mu ~ Gamma(s, beta)
real<lower=0> s;
real<lower=0> beta;
}
model {
// Hyperpriors for t, gamma, r, alpha, s, beta
t ~ normal(2, 0.1);
gamma ~ normal(0.5, 1);
r ~ normal(0.5, 0.1);
alpha ~ normal(10, 1);
s ~ normal(0.5, 0.1);
beta ~ normal(10, 1);
// Draw k, lambda, and mu
k ~ gamma(t, gamma);
lambda ~ gamma(r, alpha);
mu ~ gamma(s, beta);
// Log-Likelihood of Pareto/GGG
target += (k .* x .* log(k .* lambda)) - (x .* lgamma(k)) - (k .* lambda .* tx) + ((k - 1) .* litt);
for (i in 1:N) {
int x_i[0]; // placeholder to be passed into integrate_1d
real left_term;
left_term = integrate_1d(
pggg_palive_integrand,
tx[i],
Tcal[i],
{k[i], lambda[i], mu[i]},
{tx[i], Tcal[i]},
x_i,
1e-8
);
target += log_sum_exp(
log(left_term),
gamma_lccdf(Tcal[i] - tx[i] | k[i], k[i]*lambda[i]) - mu[i] * Tcal[i]
);
}
}
library(dplyr)
library(rstan)
library(BTYDplus)
data(groceryElog)
groceryCBS <- elog2cbs(groceryElog, T.cal = "2006-12-31")
N <- nrow(groceryCBS)
x <- groceryCBS$x
tx <- groceryCBS$t.x
Tcal <- groceryCBS$T.cal
litt <- groceryCBS$litt
# Get initial values by fitting a Pareto/NBD model
pnbd_params <- BTYD::pnbd.EstimateParameters(groceryCBS[, c("x", "t.x", "T.cal")])
names(pnbd_params) <- c("r", "alpha", "s", "beta")
generate_init_fun <- function(pnbd_params, cbs) {
function(chain_id) {
l <- list(t = 1,
gamma = 1,
r = unname(pnbd_params["r"]),
alpha = unname(pnbd_params["alpha"]),
s = unname(pnbd_params["s"]),
beta = unname(pnbd_params["beta"]))
l$k <- array(rep(1, nrow(cbs)))
l$lambda <- array(rgamma(nrow(cbs), l$r, l$alpha))
l$mu <- array(rgamma(nrow(cbs), l$s, l$beta))
l
}
}
init_fun <- generate_init_fun(pnbd_params, groceryCBS)
model <- stan(file = "ParetoGGG-cleaned-up-for-help.stan",
chains = 1,
cores = 1,
iter = 1,
seed = 234234243,
init = init_fun,
data = list(
N = N,
x = x,
tx = tx,
Tcal = Tcal,
litt = litt
))