Hello, I’m pretty much brand new to Stan and I’m working on creating my first model. I’ll do my best to write my thoughts out as clearly as I can, but please have patience as I don’t have a formal statistics background.
In writing this model, I’ve been following the workflow I found in a post by Jim Savage from the Stan tutorials. I’m on step 3 (recover simulated parameters) and I was getting some positive results, but also some results that suggest my model isn’t very robust. I was hoping to get some feedback from you all before I move forwards.
Some background on the data/model.
- My
y
is the marginal distribution of the choices of a large group over a fixed set of options. - My
X
is a matrix of observable independent variables related to each option. Each option has the same number of associated independent variables (for example, option 1, 2, 3…N will each have their own weight_lbs and cost_dollars).
I’m assuming that my data comes from a Dirichlet-multinomial process and that the marginal distributions, which sum to 1, can be thought of as a probability distribution. I’m also assuming that the multinomial probability distribution come from an underlying dirichlet distribution Dir(\alpha) where \alpha is an N_components length vector that is predictable from X.
- \alpha_1 = \exp( intercept * 1 + \beta_{lb} * X_{lb_1} + \beta_{cost} * X_{cost_1} )
Given my data y and X, I want to figure out the parameters Beta and intercept. Simplified,
- Pr(coefficients | X,y) \propto Dir(y | coefficients * X)
My workflow/model so far can be found here: in a public google colab notebook
To see the warnings/output from stan, go to Runtime -> View Runtime Logs.
So far the model does a good job of recovering the parameters, but it is outputting a lot of warnings that make me think I could be doing something better.
Thank you in advance for the advice!