# Thoughts on stan variable nomenclature?

Are there guidelines on how to name Stan variables within a model?

For example, is there a standard way to prefix intercepts vs slopes? alpha_ or a_ vs beta_ or b_?

With multiple predictors and trying various models, then transitioning those to multilevel, it might be useful to have some kind of consensus on naming variables in general as guidelines.

Something like an optional “Hungarian Notation” (see https://en.m.wikipedia.org/wiki/Hungarian_notation) for Stan?

Does this exist?
Is it a good idea?
Interested in thoughts on this.

Thank-you!

1 Like

Personally, I prefer to use a and b rather than alpha and beta. On the other hand, when it comes to the next two letters, I’ll use gamma and delta rather than g and d. And I’m not 100% consistent on the alpha and beta either!

One place where notation gets tricky is hierarchical models. If we have a vector parameter, a, with a normal prior distribution, we might write, a ~ normal(mu_a, sigma_a). But when we have multiple batches of coefficients, or varying intercepts and slopes, things can again get complicated.

In our multilevel modeling book, Jennifer and I emphasized that there can be multiple ways to write the same model, and it’s not always a good idea to enforce a particular way of writing it: what is the clearest way to write it can depend on context.

But I do feel that in many Stan models, and in statistical models in general, we make notation do some of the work that should be done by the math. That is, we don’t always have rich enough mathematical or computational frameworks to express our models. This is something that Bob, Matt, and I struggled with back in 2011 when we were designing Stan as a tool to fit hierarchical models.

So my summary is that this is a good idea. We just have to be careful not to be too rigid on these guidelines, as we don’t want to collapse this particular wavefunction given our current state of confusion!

1 Like

Very insightful, thank-you.

I guess it would be great to dig into “best practice” type of examples of multilevel models where things “get complicated” and compare and contrast various approaches such that further iterative improvements for the modeling are better supported…

while still maintaining clarity such that when my future self re-reads what I did a few weeks or months from now, it will be easier to grok and continue.

Thanks again.

1 Like
2 Likes

I use a hybrid math/computer science approach which works ok for me. I prefix variables with the typical statistics name followed by ‘_’ and a descriptive variable name in the computer science tradition.

y_height ~ normal(alpha_mean_height, sigma_sd_height)

The mathy prefix lets the stats folks interface somewhat normally with conventions as seen in the user’s guide and the rest helps me keep track of what is what.

I’d like a convention around scalars vs vectors vs matrices but I have not come up with one.

Breck

1 Like