Two parameter distribution over the simplex

Bonnevie · April 5, 2018, 9:15pm

Theorem 2 in the Pitman paper seems to have some kind of result on finite-dimensional random measures, but I don’t know whether it corresponds to truncation directly.

ariddell · April 5, 2018, 10:29pm

@aaronjg

“nested Dirichlet” isn’t a technical term. I’ve seen “Dirichlet Compound
Multinomial” used in one place. One paper where it appears is:

Doyle, Gabriel, and Charles Elkan. “Accounting for burstiness in topic
models.” In Proceedings of the 26th Annual International Conference on
Machine Learning, pp. 281-288. ACM, 2009.

In general, my sense is that Gibbs sampling really shines in this
particular area (PYP). But writing the code takes a long time.

I hope you find something that works.

Bob_Carpenter · April 13, 2018, 8:25pm

Stan only requires proper posteriors. There’s nothing to check that the posterior is proper, but Stan tends to run off the cliff when the posterior’s improper, so we tend to get very early diagnostics in the form of parametes with posterior means of +/- 1e+300.

When I wrote the transform for the simplex, I set it up so that (0, …0) (K - 1 terms) on the unconstrained scale translates to (1/K, …, 1/K) (K terms) on the constrained scale.

Truncation messes up the terms going to 1 / epsilon with epsilon → 0 in the limit, but otherwise, you’ll never get more clusters than data points, so you can go beyond your number of data points to get a conservative bin estimate. That just might lead to challenging computation.

Bob_Carpenter · April 13, 2018, 8:32pm

You can look at the logistic multinormal or you can do same thing with Student-t. It also lets you control covariance, but if you don’t want that, you can simplify computation by making the covariance diagonal

z ~ multi_student_t(nu, mu, Sigma);
theta = softmax(z);

Of course, if the covariance Sigma = diag_matrix(sigma), then this can be much more efficiently implemented as:

z ~ student_t(nu, rep_vector(0, K), sigma);

where sigma is the overall scale of variation of the log odds and nu is degrees of freedom controlling dispersion of Student-t. It’s even more efficient if sigma = rep_vector(tau, K), because then a scalar can be used for sigma in the prior for z.

Topic		Replies	Views
Generalized Dirichlet Distribution as a prior Modeling	16	2678	January 7, 2022
Generalized Dirichlet Distribution Modeling	5	1989	May 19, 2018
Problems with using Dirichlet Prior in Stan Modeling rstan , priors	1	479	May 14, 2024
Prior for Simplex, more informative than Dirichlet Modeling	9	529	February 26, 2024
Hierarchical Dirichlet Process: divergent transitions with hyperprior Modeling	1	1239	October 7, 2019

Two parameter distribution over the simplex

Related topics