Priors for highly skewed multinomial word counts

alexpghayes · May 15, 2020, 8:29pm

Suppose we have words i = 1, ..., n, and observe individual word counts x_1, ..., x_n with N = \sum_{i=1}^n x_i fixed. Often, the relative frequency of words i and j are informative for some outcome, or generally of interest, so we use a model:

(X_1, ..., X_n) \sim \mathrm{Multinomial}(N, \pi)

where \pi lives on (n-1)-simplex, and we want to estimate \pi_i and \pi_j, for example. A standard prior for \pi is then a \mathrm{Dirichlet}(\alpha) distribution, where \alpha > 0 can be interpreted as pseudo-counts, \alpha = (1, ..., 1) is uninformative, \alpha \to 0 element-wise is anti-conservative, and \alpha \to \infty element-wise is conservative.

However, we also know that the distribution of the word counts X_i is typically higher skewed, say, approximately log normal or power law distributed. Can I incorporate this information into the prior for \pi (or generally into the whole model)? Do people do things like put a LogNormal hyperprior on \alpha? Also, does \alpha \to 0 imply a particular skew on \pi?

maxbiostat · May 15, 2020, 9:34pm

Tagging @stemangiola and @martinmodrak.

stemangiola · May 15, 2020, 11:38pm

Higher skewed than what? Count data is typically highly skewed, and mutinomial models that. For a more overdispersed counts you can use dirichlet multinomial. Although none of these allow extreme large tails. I think this as multinomial is at poisson like dirichlet multinomial is at negative binomial (but this is totally intuition and non-scientific :) )

// Equivalent to multinomial(dirichlet(alpha))
  real dirichlet_multinomial_lpmf(int[] y, vector alpha) {
    	real alpha_plus = sum(alpha);

      return lgamma(alpha_plus) + sum(lgamma(alpha + to_vector(y)))
                  - lgamma(alpha_plus+sum(y)) - sum(lgamma(alpha));
  }

Where alpha is the real array parameter of a dirichlet

If the log proportions look roughly normal you should be OK with Dirichlet prior. Otherwise changing the prior to alpha does not do much, you should use something else than dirichlet, for example multinomial(softmax(parameter_coming_from_student_t) but we didn’t manage to go anywhere with that.

You could use rectangular-beta prior for extreme large tail proportional data, although I have 0 experience on that.

I think you might be well of with dirichlet_multinomial. My experience is to not go to exotic.

alexpghayes · May 17, 2020, 4:12pm

I think my real question comes down to this: multinomial models allow for highly skewed distributions, but they do not enforce highly skewed outcomes. If we remove or penalize models when \pi is dispersed approximately evenly across all counts, does it improve our estimates of \pi? It just feels like there is prior information not making it into the multinomial-dirichlet model, at least for word counts.

Can you explain this a bit more? I’m not seeing where it comes from.

stemangiola · May 18, 2020, 1:44am

I would not try to make a model enforce anything.

Anyhow, @Bob_Carpenter is much more knowledgeable than me on this. Before speculating all together of what a model can or cannot do on a certain data, maybe try to model your data with multinomial, and plot a posterior predictive check for an element that is poorly modelled. Then, members of the community can propose solutions based on something concrete.

Topic		Replies	Views
Problems to model choice probabilites directly instead of multinomial data with a dirichlet distribution Modeling fitting-issues , dirichlet-multinomial	3	664	October 7, 2022
Prior for Simplex, more informative than Dirichlet Modeling	9	657	February 26, 2024
Vague Proper Dirichlet Prior Modeling	9	4826	December 4, 2018
Combining input data and parameters of DirichletMultinomial model Modeling	3	74	March 12, 2025
Trying to generalise the dirichlet-multinomial (non-analytical) framework (replacing Dirichlet with other distributions) Modeling techniques , bioinformatics , dirichlet-multinomial	23	2789	November 5, 2018

Priors for highly skewed multinomial word counts

Related topics