Dirichlet_log: probabilities is not a valid simplex

kbchoi · September 6, 2017, 5:59am

Hi,

I am trying to fit a 4-component mixture model where the probability of component membership of each sample (pi) is estimated in a separate stan fit. I am using dirichlet to introduce uncertainty but getting many of “Exception thrown at line 20: dirichlet_log: probabilities is not a valid simplex. sum(probabilities) = nan, but should be 1”. I am using the first component to capture samples whose n is zero. Where do you think I am getting nan? Any comment would be greatly appreciated.

data {
    int<lower=1> N;
    int<lower=0> n[N];
    vector<lower=0,upper=1>[4] pi[N];
}
parameters {
    real<lower=0,upper=1> a4;
    real<upper=max(n)> a5;
    real<lower=1> b5;
    real<upper=max(n)> a6;
    real<lower=1> b6;
    real<upper=max(n)> a7;
    real<lower=1> b7;
}
model {
    for (i in 1:N) {
        vector[4] lps;
        vector[4] tau;
        tau ~ dirichlet(pi[i]);
        lps[1] = tau[1] * exp(neg_binomial_2_lpmf(n[i]|a4,  1));
        lps[2] = tau[2] * exp(neg_binomial_2_lpmf(n[i]|a5, b5));
        lps[3] = tau[3] * exp(neg_binomial_2_lpmf(n[i]|a6, b6));
        lps[4] = tau[4] * exp(neg_binomial_2_lpmf(n[i]|a7, b7));
        target += log(sum(lps));
    }
}

bgoodri · September 6, 2017, 6:12am

from the entire model block:

    for (i in 1:N) {
        vector[4] lps;
        vector[4] tau;
        tau ~ dirichlet(pi[i]);
        lps[1] = tau[1] * exp(neg_binomial_2_lpmf(n[i]|a4,  1));
        lps[2] = tau[2] * exp(neg_binomial_2_lpmf(n[i]|a5, b5));
        lps[3] = tau[3] * exp(neg_binomial_2_lpmf(n[i]|a6, b6));
        lps[4] = tau[4] * exp(neg_binomial_2_lpmf(n[i]|a7, b7));
        target += log(sum(lps));
    }

First of all, tau is undefined in this loop. You need something like

simplex[4] tau[N];

in the parameters block.

Once you fix that, you have to use numerically stable constructions like:

        lps[1] = log(tau[i,1]) + neg_binomial_2_lpmf(n[i]|a4,  1);
        lps[2] = log(tau[i,2]) + neg_binomial_2_lpmf(n[i]|a5, b5);
        lps[3] = log(tau[i,3]) + neg_binomial_2_lpmf(n[i]|a6, b6);
        lps[4] = log(tau[i,4]) + neg_binomial_2_lpmf(n[i]|a7, b7);
        target += log_sum_exp(lps);

kbchoi · September 6, 2017, 6:17am

I appreciate your response. One quick related general question. Do I need to declare tau in the parameters block even if I am not trying to estimate it?

kbchoi · September 6, 2017, 7:06am

Now I am getting “dirichlet_log: prior sample sizes[1] is 0, but must be > 0!” with the following fix.

data {
    int<lower=1> N;
    int<lower=0> n[N];
    vector<lower=0,upper=1>[4] pi[N];  // data shape should be (N, 4)
}
parameters {
    real<lower=0,upper=1> a4;
    real<upper=max(n)> a5;
    real<lower=1> b5;
    real<upper=max(n)> a6;
    real<lower=1> b6;
    real<upper=max(n)> a7;
    real<lower=1> b7;
    simplex[4] tau[N];
}
model {
    for (i in 1:N) {
        tau[i] ~ dirichlet(pi[i]);
    }
    for (i in 1:N) {
        vector[4] lps;        
        lps[1] = log(tau[i,1]) + neg_binomial_2_lpmf(n[i]|a4,  1);
        lps[2] = log(tau[i,2]) + neg_binomial_2_lpmf(n[i]|a5, b5);
        lps[3] = log(tau[i,3]) + neg_binomial_2_lpmf(n[i]|a6, b6);
        lps[4] = log(tau[i,4]) + neg_binomial_2_lpmf(n[i]|a7, b7);
        target += log_sum_exp(lps);
    }
}

bgoodri · September 6, 2017, 1:36pm

How many times? Underflowing tau to 0 is not as big a deal.

kbchoi · September 6, 2017, 1:54pm

Many many times. If the message implies underflowing, I think it is all right since my pi’s contains lots of zeros although they sum to one. The fitting did not finish with the following message at the end.

Initialization between (-2, 2) failed after 100 attempts. 
 Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.

I guess having zeros in dirichlet parameter is not good?

bgoodri · September 6, 2017, 2:14pm

Yes, the concentration parameters should be positive although they do not have to sum to 1.

kbchoi · September 6, 2017, 5:21pm

The fitting finishes successfully with the following change although it may have introduced some bias.

tau[i] ~ dirichlet(pi[i]+0.000001);

Thanks Ben!

Bob_Carpenter · September 10, 2017, 12:27pm

Adding small epsilons is rarely what you want to do. How were those pi set in the first place? You should just set them in the data rather than setting them zero then adding epsilon. You should also run sensitivity analyses to see how much they affect inference—these things can sometimes have a suprrisingly strong effect.

When you have Dirichlet parameters less than one, the prior is putting most of its mass on solutions with zeros (technically, very very small non-zero values that round to zero because 1e-350 is roughly the smallest number that can be represented).

Negative binomials are already hard to fit—I imagine making a mixture of them is going to be extra difficult unless the clusters are well separated. Michael Betancourt wrote a case study on mixtures (see web site >> users >> documentation >> case studies).

You can simplify this a bit:

vector[4] lps;        
lps[1] = log(tau[i,1]) + neg_binomial_2_lpmf(n[i]|a4,  1);
lps[2] = log(tau[i,2]) + neg_binomial_2_lpmf(n[i]|a5, b5);
lps[3] = log(tau[i,3]) + neg_binomial_2_lpmf(n[i]|a6, b6);
lps[4] = log(tau[i,4]) + neg_binomial_2_lpmf(n[i]|a7, b7);

to

vector[4] lps = log(tau[i]);        
lps[1] += neg_binomial_2_lpmf(n[i]|a4,  1);
lps[2] += neg_binomial_2_lpmf(n[i]|a5, b5);
lps[3] += neg_binomial_2_lpmf(n[i]|a6, b6);
lps[4] += neg_binomial_2_lpmf(n[i]|a7, b7);

I don’t understand the naming of the a and b. Why not follow the ordering and make those

vector<lower = 1> b[3];
vector<upper = max(n)> a[4];

You should also give the a and b priors—like othe precision-like parameters, they can tend to drift upwards because ever higher values correspond to ever lower variance.

betanalpha · September 10, 2017, 1:55pm

This is ultimately a pathology of the empirical Bayes strategy of plugging a point estimate from some previous fit into the priors. It’s just made worse by the fact that the point estimates are themselves lie on the corners of parameter space. Adding epsilon to all of the pi regularizes the point estimates and avoids the immediate pathology, but one should still be skeptical of the validity of the resulting fit.

kbchoi · September 11, 2017, 2:13am

I am dealing with a set of binomial (n_i, p_i) samples where there are four different classes according to p_i’s. Let’s assume they cluster pretty well. After I estimate the expectation on 4-dim latent variable (pi), I wanted to get the shrinkage estimators for n_i’s based upon the probabilistic class membership of each sample. In order to implement this idea, I had to separate classification and shrinkage estimation. Do you think there is a better way than using dirichlet? Do you think there’s a way to combine the classification and the shrinkage in one stan fitting? I would really appreciate any comment.

Bob_Carpenter · September 11, 2017, 10:57am

Not only is the answer “yes”, it’s the standard approach in Bayesian hierarchical modeling.

The Dirichlet is a very weak prior in that it only gives you a way to model location and a kind of precision, not anything like covariance. A multivariate logistic normal is more flexible. You can see that in the “Correlated topic model” section of the manual (in the latent Dirichlet allocation section of the “Clustering Models” chapter).

kbchoi · September 12, 2017, 12:58pm

@Bob_Carpenter Here’s my original problem. Because I am trying to “reuse” pi, I don’t feel this is a standard situation for stan. Any advice would be great.

data {
    int<lower=1> N;
    int<lower=0> n[N];
    int<lower=0> x[N];
}
parameters {
    simplex[4] pi;
    real<lower=1> a[4];
    real<lower=1> b[4];
    real<upper=max(n)> c[4];
    real<lower=1> d[4];
}
model {
    vector[4] log_pi = log(pi);
    for (i in 1:N) {
        vector[4] lps = log_pi;
        for (j in 1:4)
            lps[j] += beta_binomial_lpmf(x[i]|n[i], a[j], b[j]);
        target += log_sum_exp(lps);
    }

    //
    // Now I also want to fit n's to estimate c's and d's using the same **pi**,
    // but I have already used "target" to fit x above.
    //
    //    for (i in 1:N) {
    //        vector[4] lps2 = log_pi;
    //        for (j in 1:4)
    //            lps2[j] += neg_binomial_2_lpmf(n[i]|c[j], d[j]);
    //        target += log_sum_exp(lps2);
    //

}

stijn · September 12, 2017, 1:25pm

It’s not a problem to use target more than once. You would just be adding more components to the log likelihood. It sounds like you have two equations per observation which is fine. It can help to identify the pi’s but it can also lead to a more complex geometry. I would lookout for divergences.

kbchoi · September 12, 2017, 2:12pm

@stijn It takes a long time and did not converge. In fact, I don’t want the neg_binomial components to influence the estimation of pi and that is the cause of my headache. I guess it is also related to a more general question: What are my options when n is not fixed in my binomial mixture data?

stijn · September 13, 2017, 6:40am

I am just trying to think generatively here. This may or may not be helpful.

Your data generation process is something like the following. You have 4 different groups. In each group n ~ neg_binomial(c_j, d_j) and x ~ beta_binomial(n, a_j, b_j) with a_j, b_j, d_j > 0 and c_j < max(n).

You can already use that information to simulate x’s and n’s outside of Stan. First simulate groups of 4 a_j, b_j, c_j, d_j, then simulate n, and next x. You should then check whether these simulations are reasonable for your data. If not, you can start thinking about what is wrong with how you have written the model so far. For instance, you probably need more informative priors on a, b, c, and d. You have a mixture model with discrete distribution. Those are red flags for identifiability problems.

That made me think that maybe you expect the 4 groups to have different a_j, b_j but roughly similar c_j and d_j. If that is the case you do not need the mixture model for the neg_binomial. You can just work with a singular c_0 and d_0. When the neg_binomial component does not help to identify the groups (my previous assumption) and you are not interested in the c’s and d’s, you may not even need to estimate the neg_binomial components. (I am probably wrong in my assumptions but I am trying to understand why you want the neg_binomial in there).

What do you mean by ‘not fixed’? n varies per observation or n has measurement error or n varies per group or something else?

Bob_Carpenter · September 13, 2017, 11:09am

This is just how Bayesian inference works. BUGS includes a hack called “cut” that breaks Bayesian inference by preventing information flow between random variables. We intentionally avoided that in Stan.

This is the way to think about things!

kbchoi · September 13, 2017, 11:42am

I was trying to test whether n’s are different between clusters by comparing their confidence intervals, along the same sampled pi. I still needed to estimate c’s and d’s for that purpose. But I also found I was not thinking generatively. I appreciate your help @stijn and @Bob_Carpenter.

stijn · September 14, 2017, 2:52am

I think you can do that in the generated quantities block or outside of stan. You don’t need the c’s and d’s or the negative binomial in your model.

Simulate group membership ~ categorial_rng(pi) for each observation in each draw
Calculate the mean or median of n or 80% interval or whatever floats your boat for each simulated group
For each simulation, check whether the mean of group 1 > group 2 (or any other comparison you want to make).
Interpret the distribution of the variable you calculate in 3. E.g. if in 50% of the draws group 1 > group 2 and in 50% group 2 > group 1, the groups have roughly the same mean n.

Bob_Carpenter · September 15, 2017, 4:53pm

Stan’s generating posterior intervals rather than confidence intervals.

You can usually just do these categorical estimates in expectation directly rather than simulating. That is, you get the weight of pi in each iteration for the expectation calculation. Same expected result as simulating, but more efficient due to Rao-Blackwell.

You can’t do this easily for continuous examples because the simple sums turn into integrals—so for continuous cases the RNG approach will usually be the best way to code it.

Topic		Replies	Views
Initializing Simplex / Dirichlet Modeling fitting-issues , performance	3	2616	October 31, 2018
Error "categorical_lpmf: Probabilities parameter is not a valid simplex. " Modeling rstan	5	1363	August 13, 2020
How to properly use Simplex with Dirichlet likelihood? Modeling	9	1211	November 2, 2020
Prior for Simplex, more informative than Dirichlet Modeling	9	530	February 26, 2024
Simplex Regression Modeling	11	2619	December 14, 2017

Dirichlet_log: probabilities is not a valid simplex

Related topics