Dirichlet regression

Hello Community,

I am trying to find a way to implement a Dirichlet regression in stan.

My problem is a multiple version of a beta regression (that didn’t work when the information content of the data is low, being a 2 levels hierarchical model, Regression of proportions)

I went through some material

but I could not implement anything concrete yet

Could someone give a dummy possible model as example, or point to some more clear direction?

Thanks a lot.

2 Likes

A resource I am finding helpful: http://epub.wu.ac.at/4077/1/Report125.pdf

I found this discussion of Bob some time ago that also tries to deal with a hierarchical dirichlet regression.

https://groups.google.com/forum/#!topic/stan-users/3Gh1qvfoogY

In particular deals with the intercept/angular coefficients as such:

 parameters {
     vector[C] coef;
     real<lower=0> prior_count;
   }
   model {
     vector<lower=0>[N] alphavec;
     alphavec <- prior_count * softmax(xm * coef);
     prior_count ~ exp(1);
     ...
   } 

If I understand well:

  • shouldn’t coef be a matrix (number_of_covariates / number_of_components_of_the_dirichlet)?
  • If we softmax, then coef would not need to be a constrained variable to avoid indeterminability?

For example if I have two time points where nothing has changed

  • T0: Dirichlet(softmax([2, 2, 2, 2, 2) * 40)
  • T1: Dirichlet(softmax([2 + 1 * coef[1], 2+ 1 * coef[2], 2+ 1 * coef[3], 2+ 1 * coef[4], 2+ 1 * coef[5]) * 40)

Then coef could be any (single) number and will be unconstrained. Is this correct? Would a 0 centered prior be enough?

  • Shouldn’t a logit link function be used on xm * coef , similarly to the beta regression?

Thanks a lot!

1 Like

I might be able eventually to answer my own question, hopefully.

But I leave here some working notes for discussion.

I created a generative process to simulate my probabilities from a dirichlet regression. To solve the indeterminability of angular coefficients I constrained them to be a simplex multiplied by a positive real number

# DIRICHLET

n_samples = 500

## from http://tr.im/hH5A
logsumexp <- function (x) {
	y = max(x)
	y + log(sum(exp(x - y)))
}
softmax <- function (x) {
	exp(x - logsumexp(x))
}
set.seed(123)

phi <- 40;
x <- runif(n_samples);
props <- matrix(rep(NA, n_samples*2), ncol=2);

X = model.matrix(~cov , data = data.frame(cov=x))

coef_ang = c(0.95, 0.05)
kappa = 5
beta = matrix(c(1, 5, coef_ang*kappa), ncol=2, byrow=T)
	
y_hat = X %*% beta
for (n in 1:n_samples)  y_hat[n,] =  softmax( y_hat[n,])
for (n in 1:n_samples) props[n,] = rdirichlet(1,y_hat[n,] * phi)

plot(X[,2],props[,1], ylim=c(0,1))
points(X[,2],props[,2], col='red')

image

I will try to compile a stan model soon

You named the tool already, Dirichlet regression. I’m with Bob: I’ve never seen a Dirichlet regression like this, so take the following with a grain of salt. "
https://groups.google.com/forum/#!topic/stan-users/3Gh1qvfoogY

Did you thought about a Wishart distribution as multivariate extension of a gamma distribution?
Or if you don’t want to model correlations between the Ys go with gamma distributions and predefine
the variances the each gamma, similar the alphas prior in the Dirichlet distribution.
Hint: The Dirichlet distribution can be defined as ratio of gamma distributions. Since you don’t need
the normalization of the Dirichlet and later scale it, the normalization becomes obsolete, I feel not so
ok with it.

Thanks Andre,

yes I agree that multiplication by factors before a softmax sounds a bit strange, in fact I am trying to eliminate them when I can. I will give update on the model soon.

About the Gamma alternative, I have to research a bit, as it is not clear to me the conversion between Dirichlet framework and Gamma.

In section Random number generation its noted how to simulate a Dirichlet distribution
from gamma distributions.
https://en.wikipedia.org/wiki/Dirichlet_distribution#Gamma_distribution

A good practical paper about multinominal and Dirichlet using JAGS can be
found here:
http://www3.jouy.inra.fr/miaj/public/nosdoc/rap2012-5.pdf

In your case you might omit the normalization, thus independent gammas
with gamma(mu*sigma, sigma) could be used to fit your data. If you struggle
with the gamma, read the mean and variance part in Wikipedia. The mean and
variance just needs this transformation and confusingly the gamma has two
forms.

Thanks I will approach it

In the meanwhile, I have two questions.

  1. Why the gamma representation should be better than the “beta regression”
  1. How come something quite common such as softmax/dirichlet regression has never been encountered/established in stan-like languages?

beta “limits” you to [0,1], its based upon gamma distribution.
See for understanding the beta distribution:

http://varianceexplained.org/r/beta_binomial_baseball/

How come something quite common such as softmax/dirichlet regression has never been encountered/established in stan-like languages?

softmax is implemented in Stan and used by some models in Stan. I have no time to do a
research, but I used it myself. Dirichlet regression I have never seen.

BTW, there are interpretations of exponential, gamma, weibull, lognormal distributions and connections
eg. the exponential and the poisson. Normally we define a problem and then we think about, but you came
with the tool first and then ask how you can use it, like how I can nail that screw with my hammer.
Sorry, if I sound judging. Just my 2 cents.

https://cran.revolutionanalytics.com/web/packages/betareg/vignettes/betareg.pdf

Hello,

please have a look to an attempt to approach the problem in form of a R package testable for feedback

You may find this paper helpful [ Bayesian Regression for a Dirichlet Distributed Response using Stan]:

https://arxiv.org/abs/1808.06399

2 Likes

Thanks for sharing. I really appreciate papers that lay out the model they’re using cleanly and then include code.

I don’t think rstanarm supports Dirichlet regression yet, though I know they did a bunch of work trying to get beta regression to work, which must have similar issues (but probably not the same identifiability issue, as there are only two categories).

1 Like