This is the error I am running into and the model is below.
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can't start sampling from this initial value.
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can't start sampling from this initial value.
Initialization between (-1, 1) failed after 100 attempts.
Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
data{
int<lower=0> n_cust; //number of customers
vector<lower=0>[n_cust] x;
vector<lower=0>[n_cust] tx;
vector<lower=0>[n_cust] T;
}
parameters{
// vectors of lambda and mu for each customer.
// Here I apply limits between 0 and 1 for each
// parameter. A value of lambda or mu > 1.0 is unphysical
// since you don't enough time resolution to go less than
// 1 time unit.
vector <lower=0,upper=1.0>[n_cust] lambda;
vector <lower=0,upper=1.0>[n_cust] mu;
// parameters of the prior distributions : r, alpha, s, beta.
// for both lambda and mu
real <lower=0>r;
real <lower=0>alpha;
real <lower=0>s;
real <lower=0>beta;
}
model{
// temporary variables :
vector[n_cust] like1; // likelihood
vector[n_cust] like2; // likelihood
// Establishing hyperpriors on parameters r, alpha, s, and beta.
r ~ normal(0.5,0.1);
alpha ~ normal(10,1);
s ~ normal(0.5,0.1);
beta ~ normal(10,1);
// Establishing the Prior Distributions for lambda and mu :
lambda ~ gamma(r,alpha);
mu ~ gamma(s,beta);
// The likelihood of the Pareto/NBD model :
like1 = x .* log(lambda) + log(mu) - log(mu+lambda) - tx .* (mu+lambda);
like2 = (x + 1) .* log(lambda) - log(mu+lambda) - T .* (lambda+mu);
// Here we increment the log probability density (target) accordingly
target+= log(exp(like1)+exp(like2));
}
The error you’re getting is that the target log density is negative infinity. Probably because something is trying to take a log of a zero or something.
These things can sometimes be annoying to track down, but the best thing to do is just print stuff out and look.
Probably like1 or like2 is negative infinity. Just do a:
print(like1, like2);
After you assign them values and have a look. Give it a go and see if you find something suspicious.
The Stan program in the notebook you are copying from is not that great, although it does initialize for me in rstan after I made some changes. Not sure if it would have initialized without the changes, but I didn’t want to find out.
data {
int<lower=0> n_cust; //number of customers
vector<lower=0>[n_cust] x;
vector<lower=0>[n_cust] tx;
vector<lower=0>[n_cust] T;
}
parameters{
// vectors of lambda and mu for each customer.
// Here I apply limits between 0 and 1 for each
// parameter. A value of lambda or mu > 1.0 is unphysical
// since you don't enough time resolution to go less than
// 1 time unit.
vector <lower=0,upper=1.0>[n_cust] lambda;
vector <lower=0,upper=1.0>[n_cust] mu;
// parameters of the prior distributions : r, alpha, s, beta.
// for both lambda and mu
real <lower=0> r;
real <lower=0> alpha;
real <lower=0> s;
real <lower=0> beta;
}
model {
// temporary variables :
vector[n_cust] log_lambda = log(lambda);
vector[n_cust] mu_lambda = mu + lambda;
vector[n_cust] log_mu_lambda = log(mu_lambda);
vector[n_cust] like1 = x .* log_lambda + log(mu) - log_mu_lambda - tx .* mu_lambda;
vector[n_cust] like2 = (x + 1) .* log_lambda - log_mu_lambda - T .* mu_lambda;
// Here we increment the log probability density (target) accordingly
for (n in 1:n_cust) target += log_sum_exp(like1[n], like2[n]);
// Establishing hyperpriors on parameters r, alpha, s, and beta.
r ~ normal(0.5, 0.1);
alpha ~ normal(10, 1);
s ~ normal(0.5, 0.1);
beta ~ normal(10, 1);
// Establishing the Prior Distributions for lambda and mu :
lambda ~ gamma(r, alpha);
mu ~ gamma(s, beta);
}
Thanks for both of your help. I was originally printing the temporary values and removing records in the dataset which were causing the problem and was able to successfully run the original code.
With the change that @bgoodri suggested it works for all records. Can you help clarify why it works when we update probability one record at a time rather than the entire dataset as in the original code.
log_sum_exp(a, b) is numerically accurate and log(exp(a) + exp(b)) is numerically unaccurate and prone to overflow. Even still, it hits the maximum treedepth, does not converge, and is not a great example to learn from.
This model is based on the note from Dr Fader and Dr Hardie. I believe this is also the model by Abe (2009).
What I notice is that Stan is unable to converge when using Gamma distribution for heterogeneity on lambda and mu, and I have tried this original formulation for over a month. But it works really well when replacing it with lognormal distribution.