# Problem with initialization

I am very new to probabilistic modeling in general and was trying to run the Pareto/NBD model from this post (https://github.com/datascienceinc/oreilly-intro-to-predictive-clv/blob/master/oreilly-an-intro-to-predictive-clv-tutorial.ipynb) but have been running into initialization issues for my data set. Any help is greatly appreciated.

This is the error I am running into and the model is below.

``````Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can't start sampling from this initial value.
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can't start sampling from this initial value.

Initialization between (-1, 1) failed after 100 attempts.
Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
``````
``````data{
int<lower=0> n_cust; //number of customers
vector<lower=0>[n_cust] x;
vector<lower=0>[n_cust] tx;
vector<lower=0>[n_cust] T;
}

parameters{
// vectors of lambda and mu for each customer.
// Here I apply limits between 0 and 1 for each
// parameter. A value of lambda or mu > 1.0 is unphysical
// since you don't enough time resolution to go less than
// 1 time unit.
vector <lower=0,upper=1.0>[n_cust] lambda;
vector <lower=0,upper=1.0>[n_cust] mu;

// parameters of the prior distributions : r, alpha, s, beta.
// for both lambda and mu
real <lower=0>r;
real <lower=0>alpha;
real <lower=0>s;
real <lower=0>beta;
}

model{

// temporary variables :
vector[n_cust] like1; // likelihood
vector[n_cust] like2; // likelihood

// Establishing hyperpriors on parameters r, alpha, s, and beta.
r ~ normal(0.5,0.1);
alpha ~ normal(10,1);
s ~ normal(0.5,0.1);
beta ~ normal(10,1);

// Establishing the Prior Distributions for lambda and mu :
lambda ~ gamma(r,alpha);
mu ~ gamma(s,beta);

// The likelihood of the Pareto/NBD model :
like1 = x .* log(lambda) + log(mu) - log(mu+lambda) - tx .* (mu+lambda);
like2 = (x + 1) .* log(lambda) - log(mu+lambda) - T .* (lambda+mu);

// Here we increment the log probability density (target) accordingly
target+= log(exp(like1)+exp(like2));
}
``````

The error you’re getting is that the target log density is negative infinity. Probably because something is trying to take a log of a zero or something.

These things can sometimes be annoying to track down, but the best thing to do is just print stuff out and look.

Probably like1 or like2 is negative infinity. Just do a:

``````print(like1, like2);
``````

After you assign them values and have a look. Give it a go and see if you find something suspicious.

The Stan program in the notebook you are copying from is not that great, although it does initialize for me in rstan after I made some changes. Not sure if it would have initialized without the changes, but I didn’t want to find out.

``````data {
int<lower=0> n_cust; //number of customers
vector<lower=0>[n_cust] x;
vector<lower=0>[n_cust] tx;
vector<lower=0>[n_cust] T;
}

parameters{
// vectors of lambda and mu for each customer.
// Here I apply limits between 0 and 1 for each
// parameter. A value of lambda or mu > 1.0 is unphysical
// since you don't enough time resolution to go less than
// 1 time unit.
vector <lower=0,upper=1.0>[n_cust] lambda;
vector <lower=0,upper=1.0>[n_cust] mu;

// parameters of the prior distributions : r, alpha, s, beta.
// for both lambda and mu
real <lower=0> r;
real <lower=0> alpha;
real <lower=0> s;
real <lower=0> beta;
}

model {

// temporary variables :
vector[n_cust] log_lambda = log(lambda);
vector[n_cust] mu_lambda = mu + lambda;
vector[n_cust] log_mu_lambda = log(mu_lambda);
vector[n_cust] like1 = x .* log_lambda + log(mu) - log_mu_lambda - tx .* mu_lambda;
vector[n_cust] like2 = (x + 1) .* log_lambda - log_mu_lambda - T .* mu_lambda;

// Here we increment the log probability density (target) accordingly
for (n in 1:n_cust) target += log_sum_exp(like1[n], like2[n]);

// Establishing hyperpriors on parameters r, alpha, s, and beta.
r ~ normal(0.5, 0.1);
alpha ~ normal(10, 1);
s ~ normal(0.5, 0.1);
beta ~ normal(10, 1);

// Establishing the Prior Distributions for lambda and mu :
lambda ~ gamma(r, alpha);
mu ~ gamma(s, beta);
}
``````

Thanks for both of your help. I was originally printing the temporary values and removing records in the dataset which were causing the problem and was able to successfully run the original code.

With the change that @bgoodri suggested it works for all records. Can you help clarify why it works when we update probability one record at a time rather than the entire dataset as in the original code.

`log_sum_exp(a, b)` is numerically accurate and `log(exp(a) + exp(b))` is numerically unaccurate and prone to overflow. Even still, it hits the maximum treedepth, does not converge, and is not a great example to learn from.

Hi, I have written a vectorized version of a correlated Pareto/NBD:

``````data{

int<lower=1> N; //  number of customers
vector<lower=0>[N] x; // number of purchases per customer
vector<lower=0>[N] t_x; // recency per customer
vector<lower=0>[N] T; // observation period per customer

}

parameters{

// Individual-level correlated coefficients

matrix[2, N] eta; // normal errors
cholesky_factor_corr L_Omega; // cholesky corr.
row_vector chi; // mean
vector<lower=0,upper=pi()/2> tau_unif; // scaled variance

}

transformed parameters{

vector[N] lambda; // poisson rate (correlated)
vector[N] mu; // death rate (correlated)

vector<lower=0> tau; // variance of individual-level correlated coefficients
matrix[N,2] zeta; // individual-level correlated coefficients

for (r in 1:2)
tau[r] = 2.5 * tan(tau_unif[r]);

zeta = rep_matrix(chi,N) + (diag_pre_multiply(tau,L_Omega) * eta)';

lambda=exp(zeta[,1]);
mu=exp(zeta[,2]);

}

model{

vector[N] lam_mu=lambda+mu;

to_vector(eta) ~ std_normal();
L_Omega ~ lkj_corr_cholesky(2);
to_vector(chi) ~ normal(0,3);

target += x .* log(lambda) - log(lam_mu) - lam_mu .* t_x +
log(mu + lambda .* exp( -(T-t_x) .* (lam_mu) )) ;

}

generated quantities {

matrix[2,2] Omega;
matrix[2,2] Sigma;
vector[N] p_alive;
vector[N] cond_exp_T2;
int T2=78;

Omega = multiply_lower_tri_self_transpose(L_Omega);