Thoughts on stan variable nomenclature?

aaa · August 13, 2020, 6:56pm

Are there guidelines on how to name Stan variables within a model?

For example, is there a standard way to prefix intercepts vs slopes? alpha_ or a_ vs beta_ or b_?

With multiple predictors and trying various models, then transitioning those to multilevel, it might be useful to have some kind of consensus on naming variables in general as guidelines.

Something like an optional “Hungarian Notation” (see https://en.m.wikipedia.org/wiki/Hungarian_notation) for Stan?

Does this exist?
Is it a good idea?
Interested in thoughts on this.

Thank-you!

andrewgelman · August 13, 2020, 7:52pm

Personally, I prefer to use a and b rather than alpha and beta. On the other hand, when it comes to the next two letters, I’ll use gamma and delta rather than g and d. And I’m not 100% consistent on the alpha and beta either!

One place where notation gets tricky is hierarchical models. If we have a vector parameter, a, with a normal prior distribution, we might write, a ~ normal(mu_a, sigma_a). But when we have multiple batches of coefficients, or varying intercepts and slopes, things can again get complicated.

In our multilevel modeling book, Jennifer and I emphasized that there can be multiple ways to write the same model, and it’s not always a good idea to enforce a particular way of writing it: what is the clearest way to write it can depend on context.

But I do feel that in many Stan models, and in statistical models in general, we make notation do some of the work that should be done by the math. That is, we don’t always have rich enough mathematical or computational frameworks to express our models. This is something that Bob, Matt, and I struggled with back in 2011 when we were designing Stan as a tool to fit hierarchical models.

So my summary is that this is a good idea. We just have to be careful not to be too rigid on these guidelines, as we don’t want to collapse this particular wavefunction given our current state of confusion!

aaa · August 13, 2020, 8:08pm

Very insightful, thank-you.

I guess it would be great to dig into “best practice” type of examples of multilevel models where things “get complicated” and compare and contrast various approaches such that further iterative improvements for the modeling are better supported…

while still maintaining clarity such that when my future self re-reads what I did a few weeks or months from now, it will be easier to grok and continue.

Thanks again.

mitzimorris · August 14, 2020, 3:44am

breckbaldwin · August 14, 2020, 9:49pm

I use a hybrid math/computer science approach which works ok for me. I prefix variables with the typical statistics name followed by ‘_’ and a descriptive variable name in the computer science tradition.

y_height ~ normal(alpha_mean_height, sigma_sd_height)

The mathy prefix lets the stats folks interface somewhat normally with conventions as seen in the user’s guide and the rest helps me keep track of what is what.

I’d like a convention around scalars vs vectors vs matrices but I have not come up with one.

Breck

Topic		Replies	Views
How to get nicely named parameters in Stan? Modeling specification	1	560	November 7, 2022
Index variable in multilevel model Modeling	2	193	March 14, 2024
Linear mixed effects model with varying intercept and slope in matrix notation Modeling	2	3263	January 17, 2019
Converting Stan's Manual Multivariate Hierarchical Model Modeling	10	1106	March 1, 2021
Multilevel Models with Varying Intercepts and Slopes with Covariance Modeling	14	516	June 17, 2024

Thoughts on stan variable nomenclature?

Related topics