Model Advice

Hello, I’m pretty much brand new to Stan and I’m working on creating my first model. I’ll do my best to write my thoughts out as clearly as I can, but please have patience as I don’t have a formal statistics background.

In writing this model, I’ve been following the workflow I found in a post by Jim Savage from the Stan tutorials. I’m on step 3 (recover simulated parameters) and I was getting some positive results, but also some results that suggest my model isn’t very robust. I was hoping to get some feedback from you all before I move forwards.

Some background on the data/model.

  • My y is the marginal distribution of the choices of a large group over a fixed set of options.
  • My X is a matrix of observable independent variables related to each option. Each option has the same number of associated independent variables (for example, option 1, 2, 3…N will each have their own weight_lbs and cost_dollars).

I’m assuming that my data comes from a Dirichlet-multinomial process and that the marginal distributions, which sum to 1, can be thought of as a probability distribution. I’m also assuming that the multinomial probability distribution come from an underlying dirichlet distribution Dir(\alpha) where \alpha is an N_components length vector that is predictable from X.

  • \alpha_1 = \exp( intercept * 1 + \beta_{lb} * X_{lb_1} + \beta_{cost} * X_{cost_1} )

Given my data y and X, I want to figure out the parameters Beta and intercept. Simplified,

  • Pr(coefficients | X,y) \propto Dir(y | coefficients * X)

My workflow/model so far can be found here: in a public google colab notebook

To see the warnings/output from stan, go to Runtime -> View Runtime Logs.

So far the model does a good job of recovering the parameters, but it is outputting a lot of warnings that make me think I could be doing something better.

Thank you in advance for the advice!

1 Like

Thanks for the link to the notebook, but could you copy paste some of that here? The “View Runtime Logs” option is greyed out for me.

I’m getting this error a lot:

Exception: dirichlet_lpmf: prior sample sizes[1] is 0, but must be > 0! (in ‘unknown file name’ at line 27)

Here’s a screenshot of a part of the logs:

Oh okay, if that just happens a few times at the beginning but the chain keeps running and it stops printing those errors you’re good.

The initial positions for the chains are sampled uniformly from [-2, 2] on the unconstrained space. These choices of parameters can lead to difficult to evaluate likelihoods cause of numeric stuff (things rounding to zero and whatnot). If they go away that just means the sampler found an easier place to start where the numerics weren’t blowing up.

1 Like

Yeah, the warnings just happen a few times at the beginning like you said. It does keep running and it ultimately recovers the parameters accurately.

Ok, thanks for the guidance! I’m going to keep working on this and I’ll post if I run into any other issues.

1 Like

Just noticing, if you would put expected/reasonable initial conditions, those warning would go away.

1 Like