Error evaluating a 1D integral for Pareto/GGG Buy Till You Die marginal likelihood

sonicking · January 10, 2023, 6:27pm

Is tau ~ exponential with truncation at tx equivalent to tau ~ untruncated exponential + tx? I think the latter is a shifted exponential, which is not the same as a truncated exponential.

brandonisherman · January 10, 2023, 8:56pm

I’m not sure if that’s the case. I just refit the model as follows:

Get rid of tau_raw, declare tau via the following, which resulted in gamma_lccdf: Random variable is -50.0683, but must be nonnegative!. It looks like this is because the lack of a customer-specific lower bound is leading to scenarios where tx < tau, which is precisely what should happen because tau > tx should always hold.

  for (i in 1:N) {
    tau[i] ~ exponential(mu[i]) T[tx[i], ];
  }

Keep tau_raw and vector[N] tau = tx + tau_raw with the same for loop as above. This leads to the same diagnostic issues as when I did tau_raw ~ exponential(mu).
The above for loop, but with tau_raw[i] instead of tau[i]. I had to adjust the initial value in the generate_init_fun to avoid a Log probability evaluates to log(0), i.e. negative infinity. error. After doing so, the resulting model gives "Warning: 2000 of 2000 (100.0%) transitions ended with a divergence" and all diagnostics show the same diagnostic issues as the original model.

sonicking · January 11, 2023, 1:13am

I do not understand what you wrote there. As long as tau > tx, you should not get a negative value. You should share your full code.

Alternatively and preferably, just integrate out tau.

sonicking · January 12, 2023, 4:23pm

I went back and created another 2 versions of the Pareto/NBD model in Stan, which samples the time of death explicitly.

Version 1 is staying true to all the Gamma heterogeneity:

data{
  
  int<lower=1> N; //  number of customers
  vector<lower=0>[N] x; // number of purchases per customer
  vector<lower=0>[N] t_x; // recency per customer 
  vector<lower=0>[N] T; // observation period per customer
  int T_star;
}

parameters{
  

  real<lower=0> r;
  real<lower=0> r_over_a;
  real<lower=0> s;
  real<lower=0> s_over_b;
  
  vector<lower=t_x>[N] death; //unobserved death time  
  vector<lower=0> [N] lambda;
  vector<lower=0> [N] mu;     

}

transformed parameters{
  
  real alpha = r/r_over_a;
  real beta = s/s_over_b;
  
}

model{
  
  death ~ exponential(mu) ;
 
  lambda ~ gamma(r,alpha);
  mu ~ gamma(s,beta);

  r ~ std_normal();
  r_over_a ~ std_normal();
  s ~ std_normal();
  s_over_b ~ std_normal();

  for (i in 1:N) {
    if (death[i] > T[i]) {
      target += x[i] * log(lambda[i]) - lambda[i]*T[i] ;
    }
    else {
      target += x[i] * log(lambda[i]) - lambda[i]*death[i] ;
    }
  }

}


generated quantities {
   
   vector[N] p_alive;
   vector[N] cond_exp_T2;
   vector[N] T2;
   vector[N] p_alive_sim;
   
   T2=T_star+T;
   
   p_alive = inv( 1 + mu ./ (lambda + mu) .* (exp( (lambda+mu) .* (T-t_x) ) - 1) ) ;
   cond_exp_T2 = lambda ./ mu .* ( 1 - exp(-mu .* (T2-T)) ) .* p_alive; 
   
   for (i in 1 : N) {
     p_alive_sim[i] = (death[i]>T[i]);
   }
   
}

Version 2 is using lognormal for heterogeneity:

 data{
  
  int<lower=1> N; //  number of customers
  vector<lower=0>[N] x; // number of purchases per customer
  vector<lower=0>[N] t_x; // recency per customer 
  vector<lower=0>[N] T; // observation period per customer
  int T_star;
}

parameters{
  
  // Individual-level correlated coefficients

  matrix[2, N] eta; // normal errors
  cholesky_factor_corr[2] L_Omega; // cholesky corr.
  row_vector[2] chi; // mean
  vector<lower=0,upper=pi()/2>[2] tau_unif; // scaled variance
  vector<lower=t_x>[N] death; //unobserved death time  

}

transformed parameters{
  
  vector[N] lambda; // intercept for gamma (correlated)
  vector[N] mu; // intercept for gamma (correlated)
  
  vector<lower=0>[2] tau; // variance of individual-level correlated coefficients
  matrix[N,2] zeta; // individual-level correlated coefficients
 
  for (r in 1:2)
    tau[r] = 2.5 * tan(tau_unif[r]);
   
  zeta = rep_matrix(chi,N) + (diag_pre_multiply(tau,L_Omega) * eta)';  

  lambda=exp(zeta[,1]);
  mu=exp(zeta[,2]);

}


model{
  
//  vector[N] lam_mu=lambda+mu;     
  
  death ~ exponential(mu) ;

//  for (i in 1:N) {
//    death[i] ~ exponential((mu[i]))T[t_x[i],];
//  }

  to_vector(eta) ~ std_normal();
  L_Omega ~ lkj_corr_cholesky(2);
  to_vector(chi) ~ normal(0,3);

  for (i in 1:N) {
    if (death[i] > T[i]) {
      target += x[i] * log(lambda[i]) - lambda[i]*T[i] ;
    }
    else {
      target += x[i] * log(lambda[i]) - lambda[i]*death[i] ;
    }
  }

}


generated quantities {
  
  matrix[2,2] Omega;
  matrix[2,2] Sigma;
  vector[N] p_alive;
  vector[N] cond_exp_T2;
  vector[N] T2;
  
  T2=T_star+T;
  
  Omega = multiply_lower_tri_self_transpose(L_Omega);
  Sigma = quad_form_diag(Omega, tau);
  
  p_alive = inv( 1 + mu ./ (lambda + mu) .* (exp( (lambda+mu) .* (T-t_x) ) - 1) ) ;
  cond_exp_T2 = lambda ./ mu .* ( 1 - exp(-mu .* (T2-T)) ) .* p_alive; 
}

In terms of getting correct point estimates for the CDNOW data, these versions achieve it. The issue is with the Bayesian convergence. The rhat convergence statistics is pretty bad when using Gamma heterogeneity. It is better but still not great when using lognormal heterogeneity. This is mirroring the earlier experience when I integrated out the time of death.

You can try these on your data and report back what happens.

brandonisherman · January 17, 2023, 10:08pm

Thank you! I will play around with this code and try to understand what it’s doing. Two quick questions and a more complicated one about Version 2:

Where is the 2.5 * tan(tau_unif[r]) transformation coming from?
What is T_star and how does it relate to T2? I’m rereading Abe (2009) and looked at the online appendix but don’t see where this is coming from.
Since I’m interested in explicitly modeling regularity for each customer, (k parameter in Pareto/GGG), how would you recommend incorporating such a parameter into the lognormal formulation? I did some exploration of formulating lognormal via similar shape and scale parameters, but haven’t found anything obvious yet.

sonicking · January 17, 2023, 11:07pm

See here: 1.13 Multivariate priors for hierarchical models | Stan User’s Guide

T2 = T_star + T

I am not changing the exponential process on the interpurchase time or the death time. I am simply changing the Gamma heterogeneity on their rates. So similarly for Pareto/GGG, the purchase process would still be Gamma with k and the death process would still be exponential. You would just use lognormal instead of gamma for the heterogeneity of these processes’ parameters. In particular, the k parameter’s heterogeneity could follow lognormal. For Pareto/NBD, the use of gamma heterogeneity is for the conjugacy and closed-form. Once you use Stan, there is no need for that.

brandonisherman · January 18, 2023, 6:54pm

Your second formulation works on the groceryCBS data in BTYDplus. Thanks again!

I am not changing the exponential process on the interpurchase time or the death time. I am simply changing the Gamma heterogeneity on their rates.

Ahh, got it, thanks. So we’re just changing the probability distributions to be lognormal instead of Gamma, but the likelihood is still the same.

Also, in the block below, where are the expressions for incrementing target coming from? I looked through Abe (2009) again and I thought that in the death[i] > T[i] case, we’d be incrementing by x[i]*log(lambda[i]) + (x[i]-1)*log(t_x) - lgamma(x) - (lambda[i] + mu[i])*T based on the z=1 case in Section 3.1.

    if (death[i] > T[i]) {
      target += x[i] * log(lambda[i]) - lambda[i]*T[i] ;
    }
    else {
      target += x[i] * log(lambda[i]) - lambda[i]*death[i] ;
    }

sonicking · January 18, 2023, 7:34pm

Can you show how it is working on the groceryCBS data? What exactly are you checking to arrive at that conclusion?

Those expressions conditional on the death time are from the Section 3.1 of this note that I shared before.

Topic		Replies	Views
Integrate_1d error Modeling fitting-issues	6	523	October 25, 2021
Tutorial on Monte Carlo EM and variants for MML and MMAP Algorithms	16	3721	October 22, 2018
Function bodies error with custom likelihood Modeling specification	2	635	June 8, 2022
Estimating compound distributions without numerical integration Modeling	5	1362	January 16, 2018
Integrate_1d in likelihood computation produces NaNs in the gradient Modeling integrator	18	1110	February 5, 2021

Error evaluating a 1D integral for Pareto/GGG Buy Till You Die marginal likelihood

Related topics