Undefined values in generated quantities block

Hi,

Recently I am trying to compare the ratio of two normal likelihood in the generated quantities block (the normal likelihood of observed data under mu_1 and sigma_1 versus normal likelihood of observed data under mu_2 and sigma_2). Before adding the generated block the sampling went smoothly and the estimated parameters make sense (mu_1, mu_2, sigma_1, sigma_2 and lambda) however there seem to be some problems after I added the following generated quatities block:

data{
int<lower = 0> N;
vector<lower=0> [N] y;
}
parameters{
real mu_1;
real mu_2;
real<lower=0> sigma_1;
real<lower=0> sigma_2;
real<lower=0, upper = 1> lambda; // mixture proportion
}
generated quantities{
  real prob;
  real unnorm_prob_0; //sum of likelihood for mu_1 and sigma_1
  real unnorm_prob_1; //sum of likelihood for mu_2 and sigma_2
  real numerator;
  real denominator;
  unnorm_prob_0 = 0;
  unnorm_prob_1 = 0;
  for(i in 1 : N){
  unnorm_prob_0 += normal_lpdf(y_[i] |mu_1, sigma_1); 
  unnorm_prob_1 += normal_lpdf(y_[i] |mu_2, sigma_2);
// mu_1, mu_2, sigma_1 and sigma_2 are parameters in the model
}
  numerator = lambda*exp(unnorm_prob_0);
  denominator = lambda*exp(unnorm_prob_0) +  (1-lambda)*exp(unnorm_prob_1);
  prob = numerator/denominator;
 }

The error messga is: ‘The following variables have undefined values’ for prob and I found someone saying that this error message occurs when some generated quantity evaluates to NaN , usually due to numerical underflow or overflow. Also I tried to add the following Bernouli random variables generator and the returned error messge is :‘Exception: bernoulli_rng: Probability parameter is nan, but must be finite!’

data{
int<lower = 0> N;
vector<lower=0> [N] y;
}
parameters{
real mu_1;
real mu_2;
real<lower=0> sigma_1;
real<lower=0> sigma_2;
real<lower=0, upper = 1> lambda; // mixture proportion
}
generated quantities{
  real theta;
  real prob;
  real unnorm_prob_0; //sum of likelihood for mu_1 and sigma_1
  real unnorm_prob_1; //sum of likelihood for mu_2 and sigma_2
  real numerator;
  real denominator;
  unnorm_prob_0 = 0;
  unnorm_prob_1 = 0;
  for(i in 1 : N){
  unnorm_prob_0 += normal_lpdf(y_[i] |mu_1, sigma_1); 
  unnorm_prob_1 += normal_lpdf(y_[i] |mu_2, sigma_2);
// mu_1, mu_2, sigma_1 and sigma_2 are parameters in the model
}
  numerator = lambda*exp(unnorm_prob_0);
  denominator = lambda*exp(unnorm_prob_0) +  (1-lambda)*exp(unnorm_prob_1);
  prob = numerator/denominator;
  theta = bernoulli_rng(1 - prob);
 }

When I tried to extract ‘numerator’ and ‘denominator’ from the Stan output, I did not see NAs but some of the values for ‘numerator’ are very small (eg. 10^-200 to 0) since y are more likely follow Normal(mu_2, sigma_2) and the values for ‘denominator’ are relatively small (eg. 10^-5 to 10^-10). However in my understaning even we have some problems with arithmetic precision it should produce draws of prob to be all 0 since the numerator is so small. But I am so confused about why we would have NAs in such setting.

Thx!

But I guess if you’re getting errors when it is producing NaNs, then you won’t see the output where denominator is actually zero?

Remove prob and see what you get.

You could also try writing it differently, though this might not help:

prob = 1.0 / (1.0 + (1.0 - lambda) * exp(unnorm_prob_1 - unnorm_prob_0) / lambda);

Hi,

Thanks so much for your kindly reply!

If I removed prob I could see the summary output of both numerator and denominator and there is no warning message. But if I added prob back I could still see the summary output of numerator, denominator anf prob, and I will get the following warning message: ‘The following variables have undefined values: prob’.

Thx!

1 Like

A quick follow-up is that after adapting:

prob = 1.0 / (1.0 + (1.0 - lambda) * exp(unnorm_prob_1 - unnorm_prob_0) / lambda);

This error message: ‘The following variables have undefined values: prob’ had disappeared, and all the 4000 draws of prob are very close to 0 with the following output of summary function:

              mean   se_mean        sd      2.5%       50%     97.5% n_eff Rhat
prob  3.52e-135       NaN 2.23e-133  0.00e+00  0.00e+00 4.05e-188   NaN    1

This assumes all ys come from the same latent group. Usually a mixture model allows different data points to come from different groups.
What does your model block do? Does it have a single toplevel log_mix or one for each y[i]? If the latter then prob should be an array with separate probability for each `y[i]'.

Hi,

Thanks for your kindly reply and my model block looks like the following:

for (n in 1: N){
    target += log_sum_exp(log(lambda) + normal_lpdf(y[n]|mu_1, sigma_1),
                          log1m(lambda) + normal_lpdf(y[n]|mu_2, sigma_2));
}

The corresponding posterior probabilities for group assignments are

vector[2] prob[N];
for (n in 1 : N) {
    // prob[n,1] == posterior probability that y[n] came from mu_1
    // prob[n,2] == posterior probability that y[n] came from mu_2
    prob[n] = softmax([log(lambda) + normal_lpdf(y[n]|mu_1, sigma_1),
                       log1m(lambda) + normal_lpdf(y[n]|mu_2, sigma_2)]');
}

I don’t quite understand what prob and theta are meant to represent in your code.

1 Like

My prob is trying to calculate the relative probability that all the observed ys are from Normal(mu_1, sigma_1) instead of Normal(mu_2, sigma_2) and theta is just trying to validate whether prob is indeed a probability measure.

For your definition of prob, do you mean that prob should be a 2*N array?

My definition of prob is an N-sized array of 2-component vectors.

A model with target+= log_sum_exp inside the loop allows (and almost expects) different data points may come from different model components; unless lambda is very skewed it is rather unlikely that all ys are from N\left(\mu_1,\sigma_1\right).
Nevertheless, the posterior probability can be calculated easily because posteriors for each y are conditionally independent and thus can be simply multiplied. Using my definition of prob the posterior probability of all ys coming from mu_1 is equal to prod(prob[:,1]). (Function prod() calculates the product of all elements of an array.)

Thanks so much for the explanation and may I think that your definition of prob is a more general version of my definition of prob?

Also for the mixture model I think the estimated lambda is quite skew to 0 (50% interval is (0.005, 0.024)) and does this provide any extra insights?

Just a quick question that why this reexpression worked much better than the previous version?

Thx!

denominator = lambda*exp(unnorm_prob_0) +  (1-lambda)*exp(unnorm_prob_1);

Any time you add together exponentials of things, you can hit underflow/overflow problems. This is where the log_sum_exp trick comes in (if you Google log_sum_exp trick you’ll find people describing this).

So if you try to evaluate:

log(exp(-1000) + exp(-1000))

in R it’ll blow up because exp(-1000) underflows to zero.

If you pull out the common factor of exp(-1000) though you can do a lot of it manually:

log(exp(-1000) * (1 + 1)) = -1000 + log(2)

So dividing your numerator and denominator by lambda * exp(unnorm_prob_0) tries to protects us from underflows. Presumably something like this was happening, though without the exact numbers I’m not exactly sure where it is breaking down.

Thanks so much for your kindly reply!

May I ask that is there any more resources on preventing underflow/overflow problems in Stan?

The rule is basically any time you do an exp, think twice. There are a lot of places (cdfs, the gamma function, for instance) with difficult numerics, but exp is probably the most common. So just look out for them and try spend as much time in log-space as possible (and avoid adding exponentiated things).

There are lot of special functions too for different cases, like log1p(x) is a more stable way of computing log(1 + x) and log1p_exp(x) is a more stable way of computing log(1 + exp(x)).

Thanks so much!