RuntimeError: Initialization failed in Python 3.7.2, pystan 2.18.1.0

sh.sab · April 1, 2019, 7:12pm

Hi,
I am new to the community and trying to get hold of the pystan.

I am trying to fit the tutorial I have found for calculating Pareto/NBD to my data. It is running with no issue for the tutorial notebook. However, I am hitting the initialization error. You can find the formula and model defined below:

Definition of the model in tutorial:

The individual-level likelihood function of the Pareto/NBD model can be easily derived (e.g. Schmittlein et al. 1987; Fader et al. 2005) and will be used in the STAN code below :
L(\lambda, \mu | x, t_x, T) = \frac{\lambda^x \mu}{\lambda+\mu}e^{-(\lambda+\mu)t_x}+\frac{\lambda^{x+1}}{\lambda+\mu}e^{-(\lambda+\mu)T}

As discussed, \lambda is the count rate that goes in the Poisson distribution and \mu is the slope of the lifetime exponential distribution. The typical lifetime corresponds to \sim 1/\mu.

The priors for \lambda and \mu are gamma distributed :
g(\lambda|r,\alpha) = \frac{\alpha^r}{\Gamma(r)}\lambda^{r-1}e^{-\lambda \alpha}
and
g(\mu|s,\beta) = \frac{\beta^s}{\Gamma(s)}\mu^{s-1}e^{-\mu \beta} \; .

For each of the four model parameters (r,\alpha,s,\beta), I assigned hyperpriors that are normally distributed.

Model defined in python:

paretonbd_model="""
data{
int<lower=0> n_cust; //number of customers
vector<lower=0>[n_cust] x;
vector<lower=0>[n_cust] tx;
vector<lower=0>[n_cust] T;
}

parameters{
// vectors of lambda and mu for each customer.
// Here I apply limits between 0 and 1 for each
// parameter. A value of lambda or mu > 1.0 is unphysical
// since you don’t enough time resolution to go less than
// 1 time unit.
vector <lower=0,upper=1.0>[n_cust] lambda;
vector <lower=0,upper=1.0>[n_cust] mu;

// parameters of the prior distributions : r, alpha, s, beta.
// for both lambda and mu
real <lower=0>r;
real <lower=0>alpha;
real <lower=0>s;
real <lower=0>beta;
}

model{

// temporary variables :
vector[n_cust] like1; // likelihood
vector[n_cust] like2; // likelihood

// Establishing hyperpriors on parameters r, alpha, s, and beta.
r ~ normal(0.5,0.1);
alpha ~ normal(10,1);
s ~ normal(0.5,0.1);
beta ~ normal(10,1);

// Establishing the Prior Distributions for lambda and mu :
lambda ~ gamma(r,alpha);
mu ~ gamma(s,beta);

// The likelihood of the Pareto/NBD model :
like1 = x .* log(lambda) + log(mu) - log(mu+lambda) - tx .* (mu+lambda);
like2 = (x + 1) .* log(lambda) - log(mu+lambda) - T .* (lambda+mu);

// Here we increment the log probability density (target) accordingly
target+= log(exp(like1)+exp(like2));
}
“”"

here’s the data we will provide to STAN :

data={‘n_cust’:len(rfm_sample),
‘x’:rfm_sample[‘frequency’].values,
‘tx’:rfm_sample[‘recency’].values,
‘T’:rfm_sample[‘T’].values
}

Here is the sample data I am using.

{‘n_cust’: 50,
‘x’: array([78, 15, 91, 12, 18, 25, 3, 0, 2, 18, 11, 15, 15, 6, 21, 12, 3,
3, 25, 6, 0, 2, 9, 18, 0, 32, 0, 0, 3, 23, 6, 11, 21, 1,
15, 4, 3, 11, 9, 18, 9, 13, 5, 21, 5, 23, 5, 38, 2, 20]),
‘tx’: array([672, 397, 665, 396, 639, 700, 123, 0, 427, 700, 335, 456, 455,
244, 670, 366, 91, 63, 666, 182, 0, 14, 335, 670, 0, 706,
0, 0, 92, 706, 486, 394, 672, 61, 609, 126, 395, 670, 700,
716, 669, 519, 486, 670, 213, 700, 213, 570, 60, 670]),
‘T’: array([706, 700, 727, 426, 700, 700, 639, 364, 700, 700, 700, 456, 700,
639, 700, 700, 91, 700, 727, 608, 150, 490, 608, 700, 644, 706,
700, 547, 700, 706, 516, 515, 706, 242, 700, 678, 395, 700, 700,
716, 669, 700, 547, 700, 700, 700, 700, 692, 515, 700])}

I have attached the full dataset here.pareto_nbd.csv (216.4 KB)

Appreciate it if you guys can help pinpoint the issue.

bbbales2 · April 3, 2019, 5:35pm

Explicitly computing exps of log likelihoods causes underflow a lot.

Instead of:

target+= log(exp(like1)+exp(like2));

Try:

for(n in 1:n_cust) {
  target += log_sum_exp(like1[n], like2[n]);
}

Double check that that is the thing you want to code though. I didn’t read into the model. I’m just assuming like1 and like2 are vectors containing as elements the first and second components of the L(\lambda, \mu | x, t_x, T) equation.

Also welcome to the forums!

sh.sab · April 3, 2019, 7:03pm

Dear Ben,

Thank you for your reply. I am happy to be part of the community. I have changed the code the way you have suggested. Unfortunately, that didn’t solve the issue. The model is actually a CLV model based on Pareto/NBD.
I think the problem is in initializing the (r,\alpha,s,\beta) which in turn affects the model’s convergence. I haven’t still figured it out.

Regards,
Shahram

bbbales2 · April 3, 2019, 7:08pm

It initialized for me and was running.

The model I used was (dunno if this is what you want):

data{
  int<lower=0> n_cust; //number of customers
  vector<lower=0>[n_cust] x;
  vector<lower=0>[n_cust] tx;
  vector<lower=0>[n_cust] T;
}

parameters{
  // vectors of lambda and mu for each customer.
  // Here I apply limits between 0 and 1 for each
  // parameter. A value of lambda or mu > 1.0 is unphysical
  // since you don’t enough time resolution to go less than
  // 1 time unit.
  vector <lower=0,upper=1.0>[n_cust] lambda;
  vector <lower=0,upper=1.0>[n_cust] mu;
  
  // parameters of the prior distributions : r, alpha, s, beta.
  // for both lambda and mu
  real <lower=0>r;
  real <lower=0>alpha;
  real <lower=0>s;
  real <lower=0>beta;
}

model{
  // temporary variables :
  vector[n_cust] like1; // likelihood
  vector[n_cust] like2; // likelihood
  
  // Establishing hyperpriors on parameters r, alpha, s, and beta.
  r ~ normal(0.5,0.1);
  alpha ~ normal(10,1);
  s ~ normal(0.5,0.1);
  beta ~ normal(10,1);
  
  // Establishing the Prior Distributions for lambda and mu :
  lambda ~ gamma(r,alpha);
  mu ~ gamma(s,beta);
  
  // The likelihood of the Pareto/NBD model :
  like1 = x .* log(lambda) + log(mu) - log(mu+lambda) - tx .* (mu+lambda);
  like2 = (x + 1) .* log(lambda) - log(mu+lambda) - T .* (lambda+mu);
  
  // Here we increment the log probability density (target) accordingly
  for(n in 1:n_cust) {
    target += log_sum_exp(like1[n], like2[n]);
  }
}

sh.sab · April 3, 2019, 7:25pm

The model is the same and I am using exactly the same. Weird! What data did you used? Are you using the data I have provided in the csv file?

bbbales2 · April 3, 2019, 7:57pm

Here’s the R code I used:

library(tidyverse)
library(ggplot2)
library(rstan)

df = read.csv("pareto_nbd.csv")

model = stan_model("sh.stan")

data = list(n_cust = nrow(df),
  x = df$frequency,
  tx = df$recency,
  T = df$T)

fit = sampling(model, data = data, chains = 1, iter = 200)

sh.sab · April 4, 2019, 1:18pm

ok. Thanks.

bbbales2 · April 6, 2019, 9:32pm

Did this end up working?

sh.sab · April 7, 2019, 4:16pm

Hi,

Unfortunately, no. I have carried away by something else and left it for now. Thank you for the follow up.

Yours sincerely,
Shahram Sabzevari
M: +1-647-535-1353

eee · May 18, 2019, 8:59am

After 10 days of banging my head it finally clicked when i read this. I now finally know how to debug the init error. Had a different problem, but this is what i needed i guess.

Thank you bbbales2!

blk · July 29, 2020, 9:03am

Hi bbbales2

are you able to implement the sBG (Shifted Beta Geometric) model for CL in contractual setting into R
using rstan?

I am pretty new to rstan and don’t know how to fix all parameters.

bbbales2 · July 29, 2020, 5:10pm

But it sounds like you have a problem in mind which is the best way to learn!

I like the golf putting example as a motivation for a lot of the things you’ll need to do with Stan: Model building and expansion for golf putting

If you need to write distributions in Stan that aren’t already included in the language, you’ll want to read sections 7.3 and 7.4 of the reference manual to understand how distribution syntax works (7.3 is here).

blk · August 1, 2020, 2:57am

Thks for the suggestion.

Topic		Replies	Views
Initialization failed in PyStan Modeling	3	739	March 10, 2020
RuntimeError: Initialization failed in PyStan Modeling	3	1815	April 18, 2018
Problem with initialization Modeling	5	1108	February 21, 2021
Pystan runtime error: initialization failed Modeling pystan	2	453	November 22, 2022
RuntimeError: Initialization failed Modeling	10	802	July 28, 2020

RuntimeError: Initialization failed in Python 3.7.2, pystan 2.18.1.0

Definition of the model in tutorial:

Model defined in python:

here’s the data we will provide to STAN :

Related topics