# RuntimeError: Initialization failed in Python 3.7.2, pystan 2.18.1.0

Hi,
I am new to the community and trying to get hold of the pystan.

I am trying to fit the tutorial I have found for calculating Pareto/NBD to my data. It is running with no issue for the tutorial notebook. However, I am hitting the initialization error. You can find the formula and model defined below:

### Definition of the model in tutorial:

The individual-level likelihood function of the Pareto/NBD model can be easily derived (e.g. Schmittlein et al. 1987; Fader et al. 2005) and will be used in the STAN code below :
L(\lambda, \mu | x, t_x, T) = \frac{\lambda^x \mu}{\lambda+\mu}e^{-(\lambda+\mu)t_x}+\frac{\lambda^{x+1}}{\lambda+\mu}e^{-(\lambda+\mu)T}

As discussed, \lambda is the count rate that goes in the Poisson distribution and \mu is the slope of the lifetime exponential distribution. The typical lifetime corresponds to \sim 1/\mu.

The priors for \lambda and \mu are gamma distributed :
g(\lambda|r,\alpha) = \frac{\alpha^r}{\Gamma(r)}\lambda^{r-1}e^{-\lambda \alpha}
and
g(\mu|s,\beta) = \frac{\beta^s}{\Gamma(s)}\mu^{s-1}e^{-\mu \beta} \; .

For each of the four model parameters (r,\alpha,s,\beta), I assigned hyperpriors that are normally distributed.

### Model defined in python:

paretonbd_model="""
data{
int<lower=0> n_cust; //number of customers
vector<lower=0>[n_cust] x;
vector<lower=0>[n_cust] tx;
vector<lower=0>[n_cust] T;
}

parameters{
// vectors of lambda and mu for each customer.
// Here I apply limits between 0 and 1 for each
// parameter. A value of lambda or mu > 1.0 is unphysical
// since you don’t enough time resolution to go less than
// 1 time unit.
vector <lower=0,upper=1.0>[n_cust] lambda;
vector <lower=0,upper=1.0>[n_cust] mu;

// parameters of the prior distributions : r, alpha, s, beta.
// for both lambda and mu
real <lower=0>r;
real <lower=0>alpha;
real <lower=0>s;
real <lower=0>beta;
}

model{

// temporary variables :
vector[n_cust] like1; // likelihood
vector[n_cust] like2; // likelihood

// Establishing hyperpriors on parameters r, alpha, s, and beta.
r ~ normal(0.5,0.1);
alpha ~ normal(10,1);
s ~ normal(0.5,0.1);
beta ~ normal(10,1);

// Establishing the Prior Distributions for lambda and mu :
lambda ~ gamma(r,alpha);
mu ~ gamma(s,beta);

// The likelihood of the Pareto/NBD model :
like1 = x .* log(lambda) + log(mu) - log(mu+lambda) - tx .* (mu+lambda);
like2 = (x + 1) .* log(lambda) - log(mu+lambda) - T .* (lambda+mu);

// Here we increment the log probability density (target) accordingly
target+= log(exp(like1)+exp(like2));
}
“”"

### here’s the data we will provide to STAN :

data={‘n_cust’:len(rfm_sample),
‘x’:rfm_sample[‘frequency’].values,
‘tx’:rfm_sample[‘recency’].values,
‘T’:rfm_sample[‘T’].values
}

Here is the sample data I am using.

{‘n_cust’: 50,
‘x’: array([78, 15, 91, 12, 18, 25, 3, 0, 2, 18, 11, 15, 15, 6, 21, 12, 3,
3, 25, 6, 0, 2, 9, 18, 0, 32, 0, 0, 3, 23, 6, 11, 21, 1,
15, 4, 3, 11, 9, 18, 9, 13, 5, 21, 5, 23, 5, 38, 2, 20]),
‘tx’: array([672, 397, 665, 396, 639, 700, 123, 0, 427, 700, 335, 456, 455,
244, 670, 366, 91, 63, 666, 182, 0, 14, 335, 670, 0, 706,
0, 0, 92, 706, 486, 394, 672, 61, 609, 126, 395, 670, 700,
716, 669, 519, 486, 670, 213, 700, 213, 570, 60, 670]),
‘T’: array([706, 700, 727, 426, 700, 700, 639, 364, 700, 700, 700, 456, 700,
639, 700, 700, 91, 700, 727, 608, 150, 490, 608, 700, 644, 706,
700, 547, 700, 706, 516, 515, 706, 242, 700, 678, 395, 700, 700,
716, 669, 700, 547, 700, 700, 700, 700, 692, 515, 700])}

I have attached the full dataset here.pareto_nbd.csv (216.4 KB)

Appreciate it if you guys can help pinpoint the issue.

Explicitly computing exps of log likelihoods causes underflow a lot.

target+= log(exp(like1)+exp(like2));


Try:

for(n in 1:n_cust) {
target += log_sum_exp(like1[n], like2[n]);
}


Double check that that is the thing you want to code though. I didn’t read into the model. I’m just assuming like1 and like2 are vectors containing as elements the first and second components of the L(\lambda, \mu | x, t_x, T) equation.

Also welcome to the forums!

1 Like

Dear Ben,

Thank you for your reply. I am happy to be part of the community. I have changed the code the way you have suggested. Unfortunately, that didn’t solve the issue. The model is actually a CLV model based on Pareto/NBD.
I think the problem is in initializing the (r,\alpha,s,\beta) which in turn affects the model’s convergence. I haven’t still figured it out.

Regards,
Shahram

It initialized for me and was running.

The model I used was (dunno if this is what you want):

data{
int<lower=0> n_cust; //number of customers
vector<lower=0>[n_cust] x;
vector<lower=0>[n_cust] tx;
vector<lower=0>[n_cust] T;
}

parameters{
// vectors of lambda and mu for each customer.
// Here I apply limits between 0 and 1 for each
// parameter. A value of lambda or mu > 1.0 is unphysical
// since you don’t enough time resolution to go less than
// 1 time unit.
vector <lower=0,upper=1.0>[n_cust] lambda;
vector <lower=0,upper=1.0>[n_cust] mu;

// parameters of the prior distributions : r, alpha, s, beta.
// for both lambda and mu
real <lower=0>r;
real <lower=0>alpha;
real <lower=0>s;
real <lower=0>beta;
}

model{
// temporary variables :
vector[n_cust] like1; // likelihood
vector[n_cust] like2; // likelihood

// Establishing hyperpriors on parameters r, alpha, s, and beta.
r ~ normal(0.5,0.1);
alpha ~ normal(10,1);
s ~ normal(0.5,0.1);
beta ~ normal(10,1);

// Establishing the Prior Distributions for lambda and mu :
lambda ~ gamma(r,alpha);
mu ~ gamma(s,beta);

// The likelihood of the Pareto/NBD model :
like1 = x .* log(lambda) + log(mu) - log(mu+lambda) - tx .* (mu+lambda);
like2 = (x + 1) .* log(lambda) - log(mu+lambda) - T .* (lambda+mu);

// Here we increment the log probability density (target) accordingly
for(n in 1:n_cust) {
target += log_sum_exp(like1[n], like2[n]);
}
}


The model is the same and I am using exactly the same. Weird! What data did you used? Are you using the data I have provided in the csv file?

Here’s the R code I used:

library(tidyverse)
library(ggplot2)
library(rstan)

model = stan_model("sh.stan")

data = list(n_cust = nrow(df),
x = df$frequency, tx = df$recency,
T = df\$T)

fit = sampling(model, data = data, chains = 1, iter = 200)


ok. Thanks.

Did this end up working?

Hi,

Unfortunately, no. I have carried away by something else and left it for now. Thank you for the follow up.

Yours sincerely,
Shahram Sabzevari
M: +1-647-535-1353

After 10 days of banging my head it finally clicked when i read this. I now finally know how to debug the init error. Had a different problem, but this is what i needed i guess.

Thank you bbbales2!

Hi bbbales2

are you able to implement the sBG (Shifted Beta Geometric) model for CL in contractual setting into R
using rstan?

I am pretty new to rstan and don’t know how to fix all parameters.

But it sounds like you have a problem in mind which is the best way to learn!

I like the golf putting example as a motivation for a lot of the things you’ll need to do with Stan: https://mc-stan.org/users/documentation/case-studies/golf.html

If you need to write distributions in Stan that aren’t already included in the language, you’ll want to read sections 7.3 and 7.4 of the reference manual to understand how distribution syntax works (7.3 is here).

1 Like

Thks for the suggestion.