Model Advice

georgewambold · November 14, 2019, 7:10pm

Hello, I’m pretty much brand new to Stan and I’m working on creating my first model. I’ll do my best to write my thoughts out as clearly as I can, but please have patience as I don’t have a formal statistics background.

In writing this model, I’ve been following the workflow I found in a post by Jim Savage from the Stan tutorials. I’m on step 3 (recover simulated parameters) and I was getting some positive results, but also some results that suggest my model isn’t very robust. I was hoping to get some feedback from you all before I move forwards.

Some background on the data/model.

My y is the marginal distribution of the choices of a large group over a fixed set of options.
My X is a matrix of observable independent variables related to each option. Each option has the same number of associated independent variables (for example, option 1, 2, 3…N will each have their own weight_lbs and cost_dollars).

I’m assuming that my data comes from a Dirichlet-multinomial process and that the marginal distributions, which sum to 1, can be thought of as a probability distribution. I’m also assuming that the multinomial probability distribution come from an underlying dirichlet distribution Dir(\alpha) where \alpha is an N_components length vector that is predictable from X.

\alpha_1 = \exp( intercept * 1 + \beta_{lb} * X_{lb_1} + \beta_{cost} * X_{cost_1} )

Given my data y and X, I want to figure out the parameters Beta and intercept. Simplified,

Pr(coefficients | X,y) \propto Dir(y | coefficients * X)

My workflow/model so far can be found here: in a public google colab notebook

To see the warnings/output from stan, go to Runtime -> View Runtime Logs.

So far the model does a good job of recovering the parameters, but it is outputting a lot of warnings that make me think I could be doing something better.

Thank you in advance for the advice!

bbbales2 · November 14, 2019, 9:43pm

Thanks for the link to the notebook, but could you copy paste some of that here? The “View Runtime Logs” option is greyed out for me.

georgewambold · November 14, 2019, 9:50pm

I’m getting this error a lot:

Exception: dirichlet_lpmf: prior sample sizes[1] is 0, but must be > 0! (in ‘unknown file name’ at line 27)

Here’s a screenshot of a part of the logs:

bbbales2 · November 14, 2019, 9:54pm

Oh okay, if that just happens a few times at the beginning but the chain keeps running and it stops printing those errors you’re good.

The initial positions for the chains are sampled uniformly from [-2, 2] on the unconstrained space. These choices of parameters can lead to difficult to evaluate likelihoods cause of numeric stuff (things rounding to zero and whatnot). If they go away that just means the sampler found an easier place to start where the numerics weren’t blowing up.

georgewambold · November 14, 2019, 10:00pm

Yeah, the warnings just happen a few times at the beginning like you said. It does keep running and it ultimately recovers the parameters accurately.

Ok, thanks for the guidance! I’m going to keep working on this and I’ll post if I run into any other issues.

aakhmetz · November 15, 2019, 3:51am

Just noticing, if you would put expected/reasonable initial conditions, those warning would go away.

Topic		Replies	Views
Having trouble recovering known parameters from simulated data Modeling cmdstanpy	2	650	June 4, 2021
Having trouble recovering some model parameters when fitting the simulated data Modeling techniques , fitting-issues , irt	4	1139	June 11, 2022
Recovering parameters of fake model Modeling	1	658	April 27, 2022
Recovering parameters from fake Dirichlet data Modeling	6	795	July 30, 2019
Integer parameters Modeling	4	2756	July 3, 2017

Model Advice

Related topics