Recently I am trying to compare the ratio of two normal likelihood in the generated quantities block (the normal likelihood of observed data under mu_1 and sigma_1 versus normal likelihood of observed data under mu_2 and sigma_2). Before adding the generated block the sampling went smoothly and the estimated parameters make sense (mu_1, mu_2, sigma_1, sigma_2 and lambda) however there seem to be some problems after I added the following generated quatities block:
data{
int<lower = 0> N;
vector<lower=0> [N] y;
}
parameters{
real mu_1;
real mu_2;
real<lower=0> sigma_1;
real<lower=0> sigma_2;
real<lower=0, upper = 1> lambda; // mixture proportion
}
generated quantities{
real prob;
real unnorm_prob_0; //sum of likelihood for mu_1 and sigma_1
real unnorm_prob_1; //sum of likelihood for mu_2 and sigma_2
real numerator;
real denominator;
unnorm_prob_0 = 0;
unnorm_prob_1 = 0;
for(i in 1 : N){
unnorm_prob_0 += normal_lpdf(y_[i] |mu_1, sigma_1);
unnorm_prob_1 += normal_lpdf(y_[i] |mu_2, sigma_2);
// mu_1, mu_2, sigma_1 and sigma_2 are parameters in the model
}
numerator = lambda*exp(unnorm_prob_0);
denominator = lambda*exp(unnorm_prob_0) + (1-lambda)*exp(unnorm_prob_1);
prob = numerator/denominator;
}
The error messga is: ‘The following variables have undefined values’ for prob and I found someone saying that this error message occurs when some generated quantity evaluates to NaN , usually due to numerical underflow or overflow. Also I tried to add the following Bernouli random variables generator and the returned error messge is :‘Exception: bernoulli_rng: Probability parameter is nan, but must be finite!’
data{
int<lower = 0> N;
vector<lower=0> [N] y;
}
parameters{
real mu_1;
real mu_2;
real<lower=0> sigma_1;
real<lower=0> sigma_2;
real<lower=0, upper = 1> lambda; // mixture proportion
}
generated quantities{
real theta;
real prob;
real unnorm_prob_0; //sum of likelihood for mu_1 and sigma_1
real unnorm_prob_1; //sum of likelihood for mu_2 and sigma_2
real numerator;
real denominator;
unnorm_prob_0 = 0;
unnorm_prob_1 = 0;
for(i in 1 : N){
unnorm_prob_0 += normal_lpdf(y_[i] |mu_1, sigma_1);
unnorm_prob_1 += normal_lpdf(y_[i] |mu_2, sigma_2);
// mu_1, mu_2, sigma_1 and sigma_2 are parameters in the model
}
numerator = lambda*exp(unnorm_prob_0);
denominator = lambda*exp(unnorm_prob_0) + (1-lambda)*exp(unnorm_prob_1);
prob = numerator/denominator;
theta = bernoulli_rng(1 - prob);
}
When I tried to extract ‘numerator’ and ‘denominator’ from the Stan output, I did not see NAs but some of the values for ‘numerator’ are very small (eg. 10^-200 to 0) since y are more likely follow Normal(mu_2, sigma_2) and the values for ‘denominator’ are relatively small (eg. 10^-5 to 10^-10). However in my understaning even we have some problems with arithmetic precision it should produce draws of prob to be all 0 since the numerator is so small. But I am so confused about why we would have NAs in such setting.
If I removed prob I could see the summary output of both numerator and denominator and there is no warning message. But if I added prob back I could still see the summary output of numerator, denominator anf prob, and I will get the following warning message: ‘The following variables have undefined values: prob’.
This error message: ‘The following variables have undefined values: prob’ had disappeared, and all the 4000 draws of prob are very close to 0 with the following output of summary function:
mean se_mean sd 2.5% 50% 97.5% n_eff Rhat
prob 3.52e-135 NaN 2.23e-133 0.00e+00 0.00e+00 4.05e-188 NaN 1
This assumes all ys come from the same latent group. Usually a mixture model allows different data points to come from different groups.
What does your model block do? Does it have a single toplevel log_mix or one for each y[i]? If the latter then prob should be an array with separate probability for each `y[i]'.
The corresponding posterior probabilities for group assignments are
vector[2] prob[N];
for (n in 1 : N) {
// prob[n,1] == posterior probability that y[n] came from mu_1
// prob[n,2] == posterior probability that y[n] came from mu_2
prob[n] = softmax([log(lambda) + normal_lpdf(y[n]|mu_1, sigma_1),
log1m(lambda) + normal_lpdf(y[n]|mu_2, sigma_2)]');
}
I don’t quite understand what prob and theta are meant to represent in your code.
My prob is trying to calculate the relative probability that all the observed ys are from Normal(mu_1, sigma_1) instead of Normal(mu_2, sigma_2) and theta is just trying to validate whether prob is indeed a probability measure.
For your definition of prob, do you mean that prob should be a 2*N array?
My definition of prob is an N-sized array of 2-component vectors.
A model with target+= log_sum_exp inside the loop allows (and almost expects) different data points may come from different model components; unless lambda is very skewed it is rather unlikely that all ys are from N\left(\mu_1,\sigma_1\right).
Nevertheless, the posterior probability can be calculated easily because posteriors for each y are conditionally independent and thus can be simply multiplied. Using my definition of prob the posterior probability of all ys coming from mu_1 is equal to prod(prob[:,1]). (Function prod() calculates the product of all elements of an array.)
Any time you add together exponentials of things, you can hit underflow/overflow problems. This is where the log_sum_exp trick comes in (if you Google log_sum_exp trick you’ll find people describing this).
So if you try to evaluate:
log(exp(-1000) + exp(-1000))
in R it’ll blow up because exp(-1000) underflows to zero.
If you pull out the common factor of exp(-1000) though you can do a lot of it manually:
log(exp(-1000) * (1 + 1)) = -1000 + log(2)
So dividing your numerator and denominator by lambda * exp(unnorm_prob_0) tries to protects us from underflows. Presumably something like this was happening, though without the exact numbers I’m not exactly sure where it is breaking down.
The rule is basically any time you do an exp, think twice. There are a lot of places (cdfs, the gamma function, for instance) with difficult numerics, but exp is probably the most common. So just look out for them and try spend as much time in log-space as possible (and avoid adding exponentiated things).
There are lot of special functions too for different cases, like log1p(x) is a more stable way of computing log(1 + x) and log1p_exp(x) is a more stable way of computing log(1 + exp(x)).