 RuntimeError: Initialization failed in Python 3.7.2, pystan 2.18.1.0

Hi,
I am new to the community and trying to get hold of the pystan.

I am trying to fit the tutorial I have found for calculating Pareto/NBD to my data. It is running with no issue for the tutorial notebook. However, I am hitting the initialization error. You can find the formula and model defined below:

Definition of the model in tutorial:

The individual-level likelihood function of the Pareto/NBD model can be easily derived (e.g. Schmittlein et al. 1987; Fader et al. 2005) and will be used in the STAN code below :
L(\lambda, \mu | x, t_x, T) = \frac{\lambda^x \mu}{\lambda+\mu}e^{-(\lambda+\mu)t_x}+\frac{\lambda^{x+1}}{\lambda+\mu}e^{-(\lambda+\mu)T}

As discussed, \lambda is the count rate that goes in the Poisson distribution and \mu is the slope of the lifetime exponential distribution. The typical lifetime corresponds to \sim 1/\mu.

The priors for \lambda and \mu are gamma distributed :
g(\lambda|r,\alpha) = \frac{\alpha^r}{\Gamma(r)}\lambda^{r-1}e^{-\lambda \alpha}
and
g(\mu|s,\beta) = \frac{\beta^s}{\Gamma(s)}\mu^{s-1}e^{-\mu \beta} \; .

For each of the four model parameters (r,\alpha,s,\beta), I assigned hyperpriors that are normally distributed.

Model defined in python:

paretonbd_model="""
data{
int<lower=0> n_cust; //number of customers
vector<lower=0>[n_cust] x;
vector<lower=0>[n_cust] tx;
vector<lower=0>[n_cust] T;
}

parameters{
// vectors of lambda and mu for each customer.
// Here I apply limits between 0 and 1 for each
// parameter. A value of lambda or mu > 1.0 is unphysical
// since you don’t enough time resolution to go less than
// 1 time unit.
vector <lower=0,upper=1.0>[n_cust] lambda;
vector <lower=0,upper=1.0>[n_cust] mu;

// parameters of the prior distributions : r, alpha, s, beta.
// for both lambda and mu
real <lower=0>r;
real <lower=0>alpha;
real <lower=0>s;
real <lower=0>beta;
}

model{

// temporary variables :
vector[n_cust] like1; // likelihood
vector[n_cust] like2; // likelihood

// Establishing hyperpriors on parameters r, alpha, s, and beta.
r ~ normal(0.5,0.1);
alpha ~ normal(10,1);
s ~ normal(0.5,0.1);
beta ~ normal(10,1);

// Establishing the Prior Distributions for lambda and mu :
lambda ~ gamma(r,alpha);
mu ~ gamma(s,beta);

// The likelihood of the Pareto/NBD model :
like1 = x .* log(lambda) + log(mu) - log(mu+lambda) - tx .* (mu+lambda);
like2 = (x + 1) .* log(lambda) - log(mu+lambda) - T .* (lambda+mu);

// Here we increment the log probability density (target) accordingly
target+= log(exp(like1)+exp(like2));
}
“”"

here’s the data we will provide to STAN :

data={‘n_cust’:len(rfm_sample),
‘x’:rfm_sample[‘frequency’].values,
‘tx’:rfm_sample[‘recency’].values,
‘T’:rfm_sample[‘T’].values
}

Here is the sample data I am using.

{‘n_cust’: 50,
‘x’: array([78, 15, 91, 12, 18, 25, 3, 0, 2, 18, 11, 15, 15, 6, 21, 12, 3,
3, 25, 6, 0, 2, 9, 18, 0, 32, 0, 0, 3, 23, 6, 11, 21, 1,
15, 4, 3, 11, 9, 18, 9, 13, 5, 21, 5, 23, 5, 38, 2, 20]),
‘tx’: array([672, 397, 665, 396, 639, 700, 123, 0, 427, 700, 335, 456, 455,
244, 670, 366, 91, 63, 666, 182, 0, 14, 335, 670, 0, 706,
0, 0, 92, 706, 486, 394, 672, 61, 609, 126, 395, 670, 700,
716, 669, 519, 486, 670, 213, 700, 213, 570, 60, 670]),
‘T’: array([706, 700, 727, 426, 700, 700, 639, 364, 700, 700, 700, 456, 700,
639, 700, 700, 91, 700, 727, 608, 150, 490, 608, 700, 644, 706,
700, 547, 700, 706, 516, 515, 706, 242, 700, 678, 395, 700, 700,
716, 669, 700, 547, 700, 700, 700, 700, 692, 515, 700])}

I have attached the full dataset here.pareto_nbd.csv (216.4 KB)

Appreciate it if you guys can help pinpoint the issue.

Explicitly computing exps of log likelihoods causes underflow a lot.

target+= log(exp(like1)+exp(like2));


Try:

for(n in 1:n_cust) {
target += log_sum_exp(like1[n], like2[n]);
}


Double check that that is the thing you want to code though. I didn’t read into the model. I’m just assuming like1 and like2 are vectors containing as elements the first and second components of the L(\lambda, \mu | x, t_x, T) equation.

Also welcome to the forums!

1 Like

Dear Ben,

Thank you for your reply. I am happy to be part of the community. I have changed the code the way you have suggested. Unfortunately, that didn’t solve the issue. The model is actually a CLV model based on Pareto/NBD.
I think the problem is in initializing the (r,\alpha,s,\beta) which in turn affects the model’s convergence. I haven’t still figured it out.

Regards,
Shahram

It initialized for me and was running.

The model I used was (dunno if this is what you want):

data{
int<lower=0> n_cust; //number of customers
vector<lower=0>[n_cust] x;
vector<lower=0>[n_cust] tx;
vector<lower=0>[n_cust] T;
}

parameters{
// vectors of lambda and mu for each customer.
// Here I apply limits between 0 and 1 for each
// parameter. A value of lambda or mu > 1.0 is unphysical
// since you don’t enough time resolution to go less than
// 1 time unit.
vector <lower=0,upper=1.0>[n_cust] lambda;
vector <lower=0,upper=1.0>[n_cust] mu;

// parameters of the prior distributions : r, alpha, s, beta.
// for both lambda and mu
real <lower=0>r;
real <lower=0>alpha;
real <lower=0>s;
real <lower=0>beta;
}

model{
// temporary variables :
vector[n_cust] like1; // likelihood
vector[n_cust] like2; // likelihood

// Establishing hyperpriors on parameters r, alpha, s, and beta.
r ~ normal(0.5,0.1);
alpha ~ normal(10,1);
s ~ normal(0.5,0.1);
beta ~ normal(10,1);

// Establishing the Prior Distributions for lambda and mu :
lambda ~ gamma(r,alpha);
mu ~ gamma(s,beta);

// The likelihood of the Pareto/NBD model :
like1 = x .* log(lambda) + log(mu) - log(mu+lambda) - tx .* (mu+lambda);
like2 = (x + 1) .* log(lambda) - log(mu+lambda) - T .* (lambda+mu);

// Here we increment the log probability density (target) accordingly
for(n in 1:n_cust) {
target += log_sum_exp(like1[n], like2[n]);
}
}


The model is the same and I am using exactly the same. Weird! What data did you used? Are you using the data I have provided in the csv file?

Here’s the R code I used:

library(tidyverse)
library(ggplot2)
library(rstan)

model = stan_model("sh.stan")

data = list(n_cust = nrow(df),
x = df$frequency, tx = df$recency,
T = df\$T)

fit = sampling(model, data = data, chains = 1, iter = 200)


ok. Thanks.

Did this end up working?

Hi,

Unfortunately, no. I have carried away by something else and left it for now. Thank you for the follow up.

Yours sincerely,
Shahram Sabzevari
M: +1-647-535-1353

After 10 days of banging my head it finally clicked when i read this. I now finally know how to debug the init error. Had a different problem, but this is what i needed i guess.

Thank you bbbales2!