Hi,
I am new to the community and trying to get hold of the pystan.
I am trying to fit the tutorial I have found for calculating Pareto/NBD to my data. It is running with no issue for the tutorial notebook. However, I am hitting the initialization error. You can find the formula and model defined below:
Definition of the model in tutorial:
The individual-level likelihood function of the Pareto/NBD model can be easily derived (e.g. Schmittlein et al. 1987; Fader et al. 2005) and will be used in the STAN code below :
L(\lambda, \mu | x, t_x, T) = \frac{\lambda^x \mu}{\lambda+\mu}e^{-(\lambda+\mu)t_x}+\frac{\lambda^{x+1}}{\lambda+\mu}e^{-(\lambda+\mu)T}
As discussed, \lambda is the count rate that goes in the Poisson distribution and \mu is the slope of the lifetime exponential distribution. The typical lifetime corresponds to \sim 1/\mu.
The priors for \lambda and \mu are gamma distributed :
g(\lambda|r,\alpha) = \frac{\alpha^r}{\Gamma(r)}\lambda^{r-1}e^{-\lambda \alpha}
and
g(\mu|s,\beta) = \frac{\beta^s}{\Gamma(s)}\mu^{s-1}e^{-\mu \beta} \; .
For each of the four model parameters (r,\alpha,s,\beta), I assigned hyperpriors that are normally distributed.
Model defined in python:
paretonbd_model="""
data{
int<lower=0> n_cust; //number of customers
vector<lower=0>[n_cust] x;
vector<lower=0>[n_cust] tx;
vector<lower=0>[n_cust] T;
}
parameters{
// vectors of lambda and mu for each customer.
// Here I apply limits between 0 and 1 for each
// parameter. A value of lambda or mu > 1.0 is unphysical
// since you don’t enough time resolution to go less than
// 1 time unit.
vector <lower=0,upper=1.0>[n_cust] lambda;
vector <lower=0,upper=1.0>[n_cust] mu;
// parameters of the prior distributions : r, alpha, s, beta.
// for both lambda and mu
real <lower=0>r;
real <lower=0>alpha;
real <lower=0>s;
real <lower=0>beta;
}
model{
// temporary variables :
vector[n_cust] like1; // likelihood
vector[n_cust] like2; // likelihood
// Establishing hyperpriors on parameters r, alpha, s, and beta.
r ~ normal(0.5,0.1);
alpha ~ normal(10,1);
s ~ normal(0.5,0.1);
beta ~ normal(10,1);
// Establishing the Prior Distributions for lambda and mu :
lambda ~ gamma(r,alpha);
mu ~ gamma(s,beta);
// The likelihood of the Pareto/NBD model :
like1 = x .* log(lambda) + log(mu) - log(mu+lambda) - tx .* (mu+lambda);
like2 = (x + 1) .* log(lambda) - log(mu+lambda) - T .* (lambda+mu);
// Here we increment the log probability density (target) accordingly
target+= log(exp(like1)+exp(like2));
}
“”"
here’s the data we will provide to STAN :
data={‘n_cust’:len(rfm_sample),
‘x’:rfm_sample[‘frequency’].values,
‘tx’:rfm_sample[‘recency’].values,
‘T’:rfm_sample[‘T’].values
}
Here is the sample data I am using.
{‘n_cust’: 50,
‘x’: array([78, 15, 91, 12, 18, 25, 3, 0, 2, 18, 11, 15, 15, 6, 21, 12, 3,
3, 25, 6, 0, 2, 9, 18, 0, 32, 0, 0, 3, 23, 6, 11, 21, 1,
15, 4, 3, 11, 9, 18, 9, 13, 5, 21, 5, 23, 5, 38, 2, 20]),
‘tx’: array([672, 397, 665, 396, 639, 700, 123, 0, 427, 700, 335, 456, 455,
244, 670, 366, 91, 63, 666, 182, 0, 14, 335, 670, 0, 706,
0, 0, 92, 706, 486, 394, 672, 61, 609, 126, 395, 670, 700,
716, 669, 519, 486, 670, 213, 700, 213, 570, 60, 670]),
‘T’: array([706, 700, 727, 426, 700, 700, 639, 364, 700, 700, 700, 456, 700,
639, 700, 700, 91, 700, 727, 608, 150, 490, 608, 700, 644, 706,
700, 547, 700, 706, 516, 515, 706, 242, 700, 678, 395, 700, 700,
716, 669, 700, 547, 700, 700, 700, 700, 692, 515, 700])}
I have attached the full dataset here.pareto_nbd.csv (216.4 KB)
Appreciate it if you guys can help pinpoint the issue.