# How to prepare dataframe in order to fit a Bayesian Dirichlet regression

Hey guys,

I’m new here. I would like to get some advice regarding a project I am currently working on.

I have a process such that q ~ Dir(alpha) where alpha=exp(X*B).

Alpha has dimensions Nx1, X has dimensions NxD, and B has dimensions Dx1.

From time t=0 to t=T, I am able to observe X and q. How should my dataframe look like in order to estimate B using all information from t=0 to t=T via a Bayesian Dirichlet regression, and assuming a flat prior on B.

Hi and welcome. Do you have have some R code (or python) you’ve started that we could take a look at? And the Stan model you are planning on using?

I’m not sure which STAN model I should use. I don’t really know where to start since I didn’t get too much exposure to programming in my field of study.

But right now, I have a dictionary called dataDict with keys equal to 0, 1, 2, …, T. There is a dataframe attached to each key. For example, for key t=0:

_id observed_q X1 X2 X3
A 0.12 0.08 1.20 0.18
B 0.23 0.07 1.12 0.21
C 0.05 0.15 1.34 0.07
D 0.40 0.03 0.94 0.34
E 0.20 0.09 1.03 0.14

In this case, N=5 and D=3. I have a similar dataframe for t=1, t=2, etc. Note that the sum of observed q equals 1.

How should I organize my dataset to estimate B using a Bayesian Dirichlet regression? Which STAN model should I use? Let me know if you need more info.