Hi,

I am new to the community and trying to get hold of the pystan.

I am trying to fit the tutorial I have found for calculating Pareto/NBD to my data. It is running with no issue for the tutorial notebook. However, I am hitting the initialization error. You can find the formula and model defined below:

### Definition of the model in tutorial:

The individual-level likelihood function of the Pareto/NBD model can be easily derived (e.g. Schmittlein et al. 1987; Fader et al. 2005) and will be used in the STAN code below :

L(\lambda, \mu | x, t_x, T) = \frac{\lambda^x \mu}{\lambda+\mu}e^{-(\lambda+\mu)t_x}+\frac{\lambda^{x+1}}{\lambda+\mu}e^{-(\lambda+\mu)T}

As discussed, \lambda is the count rate that goes in the Poisson distribution and \mu is the slope of the lifetime exponential distribution. The typical lifetime corresponds to \sim 1/\mu.

The priors for \lambda and \mu are gamma distributed :

g(\lambda|r,\alpha) = \frac{\alpha^r}{\Gamma(r)}\lambda^{r-1}e^{-\lambda \alpha}

and

g(\mu|s,\beta) = \frac{\beta^s}{\Gamma(s)}\mu^{s-1}e^{-\mu \beta} \; .

For each of the four model parameters (r,\alpha,s,\beta), I assigned hyperpriors that are normally distributed.

### Model defined in python:

paretonbd_model="""

data{

int<lower=0> n_cust; //number of customers

vector<lower=0>[n_cust] x;

vector<lower=0>[n_cust] tx;

vector<lower=0>[n_cust] T;

}

parameters{

// vectors of lambda and mu for each customer.

// Here I apply limits between 0 and 1 for each

// parameter. A value of lambda or mu > 1.0 is unphysical

// since you don’t enough time resolution to go less than

// 1 time unit.

vector <lower=0,upper=1.0>[n_cust] lambda;

vector <lower=0,upper=1.0>[n_cust] mu;

// parameters of the prior distributions : r, alpha, s, beta.

// for both lambda and mu

real <lower=0>r;

real <lower=0>alpha;

real <lower=0>s;

real <lower=0>beta;

}

model{

// temporary variables :

vector[n_cust] like1; // likelihood

vector[n_cust] like2; // likelihood

// Establishing hyperpriors on parameters r, alpha, s, and beta.

r ~ normal(0.5,0.1);

alpha ~ normal(10,1);

s ~ normal(0.5,0.1);

beta ~ normal(10,1);

// Establishing the Prior Distributions for lambda and mu :

lambda ~ gamma(r,alpha);

mu ~ gamma(s,beta);

// The likelihood of the Pareto/NBD model :

like1 = x .* log(lambda) + log(mu) - log(mu+lambda) - tx .* (mu+lambda);

like2 = (x + 1) .* log(lambda) - log(mu+lambda) - T .* (lambda+mu);

// Here we increment the log probability density (target) accordingly

target+= log(exp(like1)+exp(like2));

}

“”"

### here’s the data we will provide to STAN :

data={‘n_cust’:len(rfm_sample),

‘x’:rfm_sample[‘frequency’].values,

‘tx’:rfm_sample[‘recency’].values,

‘T’:rfm_sample[‘T’].values

}

Here is the sample data I am using.

{‘n_cust’: 50,

‘x’: array([78, 15, 91, 12, 18, 25, 3, 0, 2, 18, 11, 15, 15, 6, 21, 12, 3,

3, 25, 6, 0, 2, 9, 18, 0, 32, 0, 0, 3, 23, 6, 11, 21, 1,

15, 4, 3, 11, 9, 18, 9, 13, 5, 21, 5, 23, 5, 38, 2, 20]),

‘tx’: array([672, 397, 665, 396, 639, 700, 123, 0, 427, 700, 335, 456, 455,

244, 670, 366, 91, 63, 666, 182, 0, 14, 335, 670, 0, 706,

0, 0, 92, 706, 486, 394, 672, 61, 609, 126, 395, 670, 700,

716, 669, 519, 486, 670, 213, 700, 213, 570, 60, 670]),

‘T’: array([706, 700, 727, 426, 700, 700, 639, 364, 700, 700, 700, 456, 700,

639, 700, 700, 91, 700, 727, 608, 150, 490, 608, 700, 644, 706,

700, 547, 700, 706, 516, 515, 706, 242, 700, 678, 395, 700, 700,

716, 669, 700, 547, 700, 700, 700, 700, 692, 515, 700])}

I have attached the full dataset here.pareto_nbd.csv (216.4 KB)

Appreciate it if you guys can help pinpoint the issue.