Is tau ~ exponential with truncation at tx equivalent to tau ~ untruncated exponential + tx? I think the latter is a shifted exponential, which is not the same as a truncated exponential.
Iām not sure if thatās the case. I just refit the model as follows:
- Get rid of
tau_raw
, declare tau via the following, which resulted ingamma_lccdf: Random variable is -50.0683, but must be nonnegative!
. It looks like this is because the lack of a customer-specific lower bound is leading to scenarios wheretx < tau
, which is precisely what should happen becausetau > tx
should always hold.
for (i in 1:N) {
tau[i] ~ exponential(mu[i]) T[tx[i], ];
}
-
Keep
tau_raw
andvector[N] tau = tx + tau_raw
with the same for loop as above. This leads to the same diagnostic issues as when I didtau_raw ~ exponential(mu)
. -
The above for loop, but with
tau_raw[i]
instead oftau[i]
. I had to adjust the initial value in thegenerate_init_fun
to avoid aLog probability evaluates to log(0), i.e. negative infinity.
error. After doing so, the resulting model gives"Warning: 2000 of 2000 (100.0%) transitions ended with a divergence"
and all diagnostics show the same diagnostic issues as the original model.
I do not understand what you wrote there. As long as tau > tx, you should not get a negative value. You should share your full code.
Alternatively and preferably, just integrate out tau.
I went back and created another 2 versions of the Pareto/NBD model in Stan, which samples the time of death explicitly.
Version 1 is staying true to all the Gamma heterogeneity:
data{
int<lower=1> N; // number of customers
vector<lower=0>[N] x; // number of purchases per customer
vector<lower=0>[N] t_x; // recency per customer
vector<lower=0>[N] T; // observation period per customer
int T_star;
}
parameters{
real<lower=0> r;
real<lower=0> r_over_a;
real<lower=0> s;
real<lower=0> s_over_b;
vector<lower=t_x>[N] death; //unobserved death time
vector<lower=0> [N] lambda;
vector<lower=0> [N] mu;
}
transformed parameters{
real alpha = r/r_over_a;
real beta = s/s_over_b;
}
model{
death ~ exponential(mu) ;
lambda ~ gamma(r,alpha);
mu ~ gamma(s,beta);
r ~ std_normal();
r_over_a ~ std_normal();
s ~ std_normal();
s_over_b ~ std_normal();
for (i in 1:N) {
if (death[i] > T[i]) {
target += x[i] * log(lambda[i]) - lambda[i]*T[i] ;
}
else {
target += x[i] * log(lambda[i]) - lambda[i]*death[i] ;
}
}
}
generated quantities {
vector[N] p_alive;
vector[N] cond_exp_T2;
vector[N] T2;
vector[N] p_alive_sim;
T2=T_star+T;
p_alive = inv( 1 + mu ./ (lambda + mu) .* (exp( (lambda+mu) .* (T-t_x) ) - 1) ) ;
cond_exp_T2 = lambda ./ mu .* ( 1 - exp(-mu .* (T2-T)) ) .* p_alive;
for (i in 1 : N) {
p_alive_sim[i] = (death[i]>T[i]);
}
}
Version 2 is using lognormal for heterogeneity:
data{
int<lower=1> N; // number of customers
vector<lower=0>[N] x; // number of purchases per customer
vector<lower=0>[N] t_x; // recency per customer
vector<lower=0>[N] T; // observation period per customer
int T_star;
}
parameters{
// Individual-level correlated coefficients
matrix[2, N] eta; // normal errors
cholesky_factor_corr[2] L_Omega; // cholesky corr.
row_vector[2] chi; // mean
vector<lower=0,upper=pi()/2>[2] tau_unif; // scaled variance
vector<lower=t_x>[N] death; //unobserved death time
}
transformed parameters{
vector[N] lambda; // intercept for gamma (correlated)
vector[N] mu; // intercept for gamma (correlated)
vector<lower=0>[2] tau; // variance of individual-level correlated coefficients
matrix[N,2] zeta; // individual-level correlated coefficients
for (r in 1:2)
tau[r] = 2.5 * tan(tau_unif[r]);
zeta = rep_matrix(chi,N) + (diag_pre_multiply(tau,L_Omega) * eta)';
lambda=exp(zeta[,1]);
mu=exp(zeta[,2]);
}
model{
// vector[N] lam_mu=lambda+mu;
death ~ exponential(mu) ;
// for (i in 1:N) {
// death[i] ~ exponential((mu[i]))T[t_x[i],];
// }
to_vector(eta) ~ std_normal();
L_Omega ~ lkj_corr_cholesky(2);
to_vector(chi) ~ normal(0,3);
for (i in 1:N) {
if (death[i] > T[i]) {
target += x[i] * log(lambda[i]) - lambda[i]*T[i] ;
}
else {
target += x[i] * log(lambda[i]) - lambda[i]*death[i] ;
}
}
}
generated quantities {
matrix[2,2] Omega;
matrix[2,2] Sigma;
vector[N] p_alive;
vector[N] cond_exp_T2;
vector[N] T2;
T2=T_star+T;
Omega = multiply_lower_tri_self_transpose(L_Omega);
Sigma = quad_form_diag(Omega, tau);
p_alive = inv( 1 + mu ./ (lambda + mu) .* (exp( (lambda+mu) .* (T-t_x) ) - 1) ) ;
cond_exp_T2 = lambda ./ mu .* ( 1 - exp(-mu .* (T2-T)) ) .* p_alive;
}
In terms of getting correct point estimates for the CDNOW data, these versions achieve it. The issue is with the Bayesian convergence. The rhat convergence statistics is pretty bad when using Gamma heterogeneity. It is better but still not great when using lognormal heterogeneity. This is mirroring the earlier experience when I integrated out the time of death.
You can try these on your data and report back what happens.
Thank you! I will play around with this code and try to understand what itās doing. Two quick questions and a more complicated one about Version 2:
- Where is the
2.5 * tan(tau_unif[r])
transformation coming from? - What is
T_star
and how does it relate toT2
? Iām rereading Abe (2009) and looked at the online appendix but donāt see where this is coming from. - Since Iām interested in explicitly modeling regularity for each customer, (
k
parameter in Pareto/GGG), how would you recommend incorporating such a parameter into the lognormal formulation? I did some exploration of formulating lognormal via similar shape and scale parameters, but havenāt found anything obvious yet.
See here: 1.13 Multivariate priors for hierarchical models | Stan Userās Guide
T2 = T_star + T
I am not changing the exponential process on the interpurchase time or the death time. I am simply changing the Gamma heterogeneity on their rates. So similarly for Pareto/GGG, the purchase process would still be Gamma with k and the death process would still be exponential. You would just use lognormal instead of gamma for the heterogeneity of these processesā parameters. In particular, the k parameterās heterogeneity could follow lognormal. For Pareto/NBD, the use of gamma heterogeneity is for the conjugacy and closed-form. Once you use Stan, there is no need for that.
Your second formulation works on the groceryCBS
data in BTYDplus
. Thanks again!
I am not changing the exponential process on the interpurchase time or the death time. I am simply changing the Gamma heterogeneity on their rates.
Ahh, got it, thanks. So weāre just changing the probability distributions to be lognormal instead of Gamma, but the likelihood is still the same.
Also, in the block below, where are the expressions for incrementing target
coming from? I looked through Abe (2009) again and I thought that in the death[i] > T[i]
case, weād be incrementing by x[i]*log(lambda[i]) + (x[i]-1)*log(t_x) - lgamma(x) - (lambda[i] + mu[i])*T
based on the z=1
case in Section 3.1.
if (death[i] > T[i]) {
target += x[i] * log(lambda[i]) - lambda[i]*T[i] ;
}
else {
target += x[i] * log(lambda[i]) - lambda[i]*death[i] ;
}
Can you show how it is working on the groceryCBS
data? What exactly are you checking to arrive at that conclusion?
Those expressions conditional on the death time are from the Section 3.1 of this note that I shared before.