Error evaluating a 1D integral for Pareto/GGG Buy Till You Die marginal likelihood

Is tau ~ exponential with truncation at tx equivalent to tau ~ untruncated exponential + tx? I think the latter is a shifted exponential, which is not the same as a truncated exponential.

Iā€™m not sure if thatā€™s the case. I just refit the model as follows:

  • Get rid of tau_raw, declare tau via the following, which resulted in gamma_lccdf: Random variable is -50.0683, but must be nonnegative!. It looks like this is because the lack of a customer-specific lower bound is leading to scenarios where tx < tau, which is precisely what should happen because tau > tx should always hold.
  for (i in 1:N) {
    tau[i] ~ exponential(mu[i]) T[tx[i], ];
  }
  • Keep tau_raw and vector[N] tau = tx + tau_raw with the same for loop as above. This leads to the same diagnostic issues as when I did tau_raw ~ exponential(mu).

  • The above for loop, but with tau_raw[i] instead of tau[i]. I had to adjust the initial value in the generate_init_fun to avoid a Log probability evaluates to log(0), i.e. negative infinity. error. After doing so, the resulting model gives "Warning: 2000 of 2000 (100.0%) transitions ended with a divergence" and all diagnostics show the same diagnostic issues as the original model.

I do not understand what you wrote there. As long as tau > tx, you should not get a negative value. You should share your full code.

Alternatively and preferably, just integrate out tau.

I went back and created another 2 versions of the Pareto/NBD model in Stan, which samples the time of death explicitly.

Version 1 is staying true to all the Gamma heterogeneity:

data{
  
  int<lower=1> N; //  number of customers
  vector<lower=0>[N] x; // number of purchases per customer
  vector<lower=0>[N] t_x; // recency per customer 
  vector<lower=0>[N] T; // observation period per customer
  int T_star;
}

parameters{
  

  real<lower=0> r;
  real<lower=0> r_over_a;
  real<lower=0> s;
  real<lower=0> s_over_b;
  
  vector<lower=t_x>[N] death; //unobserved death time  
  vector<lower=0> [N] lambda;
  vector<lower=0> [N] mu;     

}

transformed parameters{
  
  real alpha = r/r_over_a;
  real beta = s/s_over_b;
  
}

model{
  
  death ~ exponential(mu) ;
 
  lambda ~ gamma(r,alpha);
  mu ~ gamma(s,beta);

  r ~ std_normal();
  r_over_a ~ std_normal();
  s ~ std_normal();
  s_over_b ~ std_normal();

  for (i in 1:N) {
    if (death[i] > T[i]) {
      target += x[i] * log(lambda[i]) - lambda[i]*T[i] ;
    }
    else {
      target += x[i] * log(lambda[i]) - lambda[i]*death[i] ;
    }
  }

}


generated quantities {
   
   vector[N] p_alive;
   vector[N] cond_exp_T2;
   vector[N] T2;
   vector[N] p_alive_sim;
   
   T2=T_star+T;
   
   p_alive = inv( 1 + mu ./ (lambda + mu) .* (exp( (lambda+mu) .* (T-t_x) ) - 1) ) ;
   cond_exp_T2 = lambda ./ mu .* ( 1 - exp(-mu .* (T2-T)) ) .* p_alive; 
   
   for (i in 1 : N) {
     p_alive_sim[i] = (death[i]>T[i]);
   }
   
}

Version 2 is using lognormal for heterogeneity:

 data{
  
  int<lower=1> N; //  number of customers
  vector<lower=0>[N] x; // number of purchases per customer
  vector<lower=0>[N] t_x; // recency per customer 
  vector<lower=0>[N] T; // observation period per customer
  int T_star;
}

parameters{
  
  // Individual-level correlated coefficients

  matrix[2, N] eta; // normal errors
  cholesky_factor_corr[2] L_Omega; // cholesky corr.
  row_vector[2] chi; // mean
  vector<lower=0,upper=pi()/2>[2] tau_unif; // scaled variance
  vector<lower=t_x>[N] death; //unobserved death time  

}

transformed parameters{
  
  vector[N] lambda; // intercept for gamma (correlated)
  vector[N] mu; // intercept for gamma (correlated)
  
  vector<lower=0>[2] tau; // variance of individual-level correlated coefficients
  matrix[N,2] zeta; // individual-level correlated coefficients
 
  for (r in 1:2)
    tau[r] = 2.5 * tan(tau_unif[r]);
   
  zeta = rep_matrix(chi,N) + (diag_pre_multiply(tau,L_Omega) * eta)';  

  lambda=exp(zeta[,1]);
  mu=exp(zeta[,2]);

}


model{
  
//  vector[N] lam_mu=lambda+mu;     
  
  death ~ exponential(mu) ;

//  for (i in 1:N) {
//    death[i] ~ exponential((mu[i]))T[t_x[i],];
//  }

  to_vector(eta) ~ std_normal();
  L_Omega ~ lkj_corr_cholesky(2);
  to_vector(chi) ~ normal(0,3);

  for (i in 1:N) {
    if (death[i] > T[i]) {
      target += x[i] * log(lambda[i]) - lambda[i]*T[i] ;
    }
    else {
      target += x[i] * log(lambda[i]) - lambda[i]*death[i] ;
    }
  }

}


generated quantities {
  
  matrix[2,2] Omega;
  matrix[2,2] Sigma;
  vector[N] p_alive;
  vector[N] cond_exp_T2;
  vector[N] T2;
  
  T2=T_star+T;
  
  Omega = multiply_lower_tri_self_transpose(L_Omega);
  Sigma = quad_form_diag(Omega, tau);
  
  p_alive = inv( 1 + mu ./ (lambda + mu) .* (exp( (lambda+mu) .* (T-t_x) ) - 1) ) ;
  cond_exp_T2 = lambda ./ mu .* ( 1 - exp(-mu .* (T2-T)) ) .* p_alive; 
}

In terms of getting correct point estimates for the CDNOW data, these versions achieve it. The issue is with the Bayesian convergence. The rhat convergence statistics is pretty bad when using Gamma heterogeneity. It is better but still not great when using lognormal heterogeneity. This is mirroring the earlier experience when I integrated out the time of death.

You can try these on your data and report back what happens.

Thank you! I will play around with this code and try to understand what itā€™s doing. Two quick questions and a more complicated one about Version 2:

  1. Where is the 2.5 * tan(tau_unif[r]) transformation coming from?
  2. What is T_star and how does it relate to T2? Iā€™m rereading Abe (2009) and looked at the online appendix but donā€™t see where this is coming from.
  3. Since Iā€™m interested in explicitly modeling regularity for each customer, (k parameter in Pareto/GGG), how would you recommend incorporating such a parameter into the lognormal formulation? I did some exploration of formulating lognormal via similar shape and scale parameters, but havenā€™t found anything obvious yet.

See here: 1.13 Multivariate priors for hierarchical models | Stan Userā€™s Guide

T2 = T_star + T

I am not changing the exponential process on the interpurchase time or the death time. I am simply changing the Gamma heterogeneity on their rates. So similarly for Pareto/GGG, the purchase process would still be Gamma with k and the death process would still be exponential. You would just use lognormal instead of gamma for the heterogeneity of these processesā€™ parameters. In particular, the k parameterā€™s heterogeneity could follow lognormal. For Pareto/NBD, the use of gamma heterogeneity is for the conjugacy and closed-form. Once you use Stan, there is no need for that.

1 Like

Your second formulation works on the groceryCBS data in BTYDplus. Thanks again!

I am not changing the exponential process on the interpurchase time or the death time. I am simply changing the Gamma heterogeneity on their rates.

Ahh, got it, thanks. So weā€™re just changing the probability distributions to be lognormal instead of Gamma, but the likelihood is still the same.

Also, in the block below, where are the expressions for incrementing target coming from? I looked through Abe (2009) again and I thought that in the death[i] > T[i] case, weā€™d be incrementing by x[i]*log(lambda[i]) + (x[i]-1)*log(t_x) - lgamma(x) - (lambda[i] + mu[i])*T based on the z=1 case in Section 3.1.

    if (death[i] > T[i]) {
      target += x[i] * log(lambda[i]) - lambda[i]*T[i] ;
    }
    else {
      target += x[i] * log(lambda[i]) - lambda[i]*death[i] ;
    }

Can you show how it is working on the groceryCBS data? What exactly are you checking to arrive at that conclusion?

Those expressions conditional on the death time are from the Section 3.1 of this note that I shared before.