Unifying names of output columns in bayesplot / tidybayes / etc

@jonah, @tjmahr, @paul.buerkner, and I just had an excellent skype conversation about unifying various aspects of Bayesian interfaces in the R ecosystem (particularly making sure the rstan ecosystem and tidybayes work well together). The easiest first step towards this is to unify naming conventions, which we agreed it would help to get more input on.

While there are a number of things to name (functions, function arguments, output variable names / columns, etc), we decided to focus in the short term on output column names because that is the hardest thing to change in the future (functions and function arguments are easier to deprecate with aliases, changing output formats breaks user code). Since I am going to submit tidybayes to CRAN soon, it would be best to make breaking name changes before that happens to affect the least number of users.

To open up the conversation, here are some possible synonyms for different columns that tend to show up in function output (I’m ignoring meta-data conventions like capitalization or dot-names for the purposes of this conversation):

  • chain
  • iteration, draw, sample
  • parameter, term, variable
  • value, estimate (as the value of a parameter, term, or variable)
  • value, estimate, prediction, pred (as the name for a column in a data frame output by tidybayes::add_fitted_samples, which derives from posterior_linpred)
  • value, estimate, prediction, pred (as the name for a column in a data frame output by tidybayes::add_predicted_samples, which derives from posterior_predict)

I think chain / iteration is the dominant terminology in most packages for those terms.

parameter / value is also I think most common, while term / estimate is used analogously in broom—but perhaps not terminology we want to emulate in the Bayesian world. Personally I think value is a little vague (everything is a value), but parameter is probably better than term, so maybe parameter / estimate?

For posterior_linpred / tidybayes::add_fitted_samples type output, I lean estimate, prediction, or pred (again because value is vague). Jonah also suggested on the call coming up with a new utterance for this.

For posterior_predict type output, prediction or pred seems the most obvious…

For reference, the current terminology in tidybayes is (in order): chain, iteration, term, estimate, estimate, pred. I admit that these choices (particularly term, which was motivated by compatibility with broom) are not all ideal, and some should likely be changed.

Thoughts?

4 Likes

Hi,

this discussion is relevant for Python too. We have something similar going on with the ArviZ lib

Should “iterations” refer to ordered MCMC draws (as needed to draw a traceplot)? Or can it also refer to, say, 200 random draws from the posterior (as in a spaghetti plot of posterior predictions)?

parameter is probably better than term

I like parameter for its specificity. The only wrinkle for me is that we have also posterior uncertainty over computed values that are not strictly parameters like the Bayesian R^2.

The term iteration shouldn’t depend on having sorting so both uses are fine. For [quote=“tjmahr, post:3, topic:4577”]
I like parameter for its specificity. The only wrinkle for me is that we have also posterior uncertainty over computed values that are not strictly parameters like the Bayesian R^2.
[/quote]

Those should be called something else, maybe derived quantities, because they’re not necassary for understanding sampler behavior, you only need the parameters and their auxiliary components (e.g. momenta and gradients) for treat

I also like parameter and I think there is no problem calling transformed parameters (computed from the original ones) parameters themselves. I am fine with both value or estimate in general but prefer estimate for add_fitted_samples() and prediction (over pred) for add_predicted_samples.

I lean towards having a single word that encompasses both (maybe “parameter”, per @paul.buerkner’s suggestion), because there are certain common formats (long-format data frames, for example) where it is useful to put both parameters and transformed parameters / derived quantities into the same column (necessitating a single name). Is there is a better word than “parameter” to use for such a column?

1 Like

This is a good question. My initial instinct was to say iteration should be used for both, and that is the case in tidybayes currently, but upon reflection I think this leads to annoying corner cases in the API during use.

For example, most functions in tidybayes output a data frame with a .chain and .iteration column. Yet a common use case (say in making spaghetti plots) is to take say 100 random samples from the posterior. There you really just want a column that uniquely identifies each draw, whatever chain it came from, so you can pick 100 of them at random. The problem is that sometimes this is the .iteration column and sometimes it isn’t — some functions don’t provide chain information, so in cleaning up output tidybayes puts NA into .chain and has .iteration uniquely index all draws from the posterior; other functions do provide chain information, so .iteration no longer uniquely indexes all draws. In retrospect this was a bad choice on my part: the interpretation of .iteration becomes dependent on the value in .chain, which is a straight-up no-no for a clean data format.

Maybe the solution is to clearly distinguish between these: to only provide .chain and .iteration values when they actually refer to the chain and iteration, but to always provide a .draw column that is guaranteed to uniquely identify each draw from the posterior. This way, .iteration would still be included in output but using it to uniquely identify draws becomes deprecated in favor of using .draw.

1 Like

Thanks everyone for input so far! Below is my attempt to summarize where things seem to be leaning based on the conversation so far, along with where I lean:

  • chain
    • -> chain
  • iteration, draw, sample
    • -> either iteration, or split into two: iteration which indexes within chain, and draw which uniquely identifies draws. I lean towards splitting.
  • parameter, term, variable
    • -> parameter
  • value, estimate (as the value of a parameter, term, or variable)
    • -> estimate or value. I lean towards estimate.
  • value, estimate, prediction, pred (as the name for a column in a data frame output by tidybayes::add_fitted_samples, which derives from posterior_linpred)
    • -> estimate or come up with a new word. Pending coming up with a new word (@jonah?), I lean towards estimate.
  • value, estimate, prediction, pred (as the name for a column in a data frame output by tidybayes::add_predicted_samples, which derives from posterior_predict)
    • -> prediction

Thoughts @tjmahr @paul.buerkner @jonah @sakrejda @ahartikainen?

Also, is there anyone else whose input would be good to have on this?

2 Likes

Wrt the usage of chain/iteration/draw I like the distinction where iteration is ordered and comes from the algorithm whereas you use “draw” more loosely. I think “chain” tripspeople up bc we tend to think of it as ordered integers but it’s more of of a uid so if you don’t have it you should just generate one rather than leaving “na”

1 Like

There is often inconsistency how sample and draw is used as also in the above two sentences. Stan team has tried to be consistent with sample vs. draw, so that you would write
“take say a sample of 100 random draws from the posterior.”

See also Sampling (statistics) - Wikipedia which has
“a data sample is a set of data collected and/or selected from a statistical population by a defined procedure.[1] The elements of a sample are known as sample points, sampling units or observations [citation needed].”

Following this logic, it is natural that the elements of posterior sample are posterior draws.

1 Like

I don’t think we’re going to find a vocabulary that appeases everyone as applied fields have internalized different heuristic vocabularies that don’t have any mathematical backing (not that the theorists have been helping by providing a more consistent vocabulary). We can go with a vocabulary that’s more consistent with the mathematics (which I would much prefer) but it will inevitably upset people. Then again, any set vocabulary will probably upset someone.

A sampling procedure for the target distribution \pi is any process that can generate a sequence, \{x_{1}, \ldots, x_{N}\}, of arbitrary length such that

\lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n = 1}^{N} f(x_{n}) = \int \mathrm{d} x \, \pi(x) f(x).

This general definition places constraints only on the asymptotic behavior of these sequences, hence there’s little meaning to finite sequences. That said, a finite sequence from one of these procedures is typically referred to as a sample.

Monte Carlo is a sampling procedure that builds up the elements of a sampling sequence independently. These individual elements are often referred to as realizations of an exact sampling procedure. By definition any subset of a Monte Carlo sample is also a sample, including a single realization. Hence “sample” can just as well describe a single point as a collection of points! At the same time permutations of samples are also samples.

Markov chain Monte Carlo builds up the elements of a sampling sequence using a Markov chain. Each transition of a Markov transition function generates a new point given the last point of an existing sequence, extending the Markov chain and yielding a new sequence that is one element larger. If we index each transition of the Markov chain with integer-valued iterations then each element of a sequence can also be identified with the iteration.

To make things more confusing the Markov transitions are formally conditional probability distributions, and generating a new point is equivalent to sampling from that conditional probability distribution (or generating a new realization from that conditional distribution). Hence we can talk about each state of a Markov chain as a sample, but a sample from the Markov transition and not the target distribution! Some have tried to avoid this confusing by referring to the Markov transitions as draws, but I don’t think introducing synonyms that refer to different contexts arbitrarily is particularly helpful notation.

Perhaps not unsurprisingly, you cannot permute a Markov chain to yield a valid Markov chain. But you can thin a Markov chain to yield a valid Markov chain. More formally thinning is actually introducing a new Markov transition that applies the initial transition multiple times. The resulting Markov transition has the same invariant distribution, hence it yields another valid sampling procedure. By this logic one can always reduce a Markov chain to a Markov chain that yields only one transition, hence some elements (namely the first element, or those falling on thinning periods) are also technically a valid sampling sequence or sample, but not all.

Anyways, you can see that some of the different notations actually come from different algorithms which only exacerbates the confusion here. Consequently I think the first question that has to be answered before settling on a vocabulary is to what is the vocabulary referring?. Is this notation meant to refer to only MCMC or will it also encompass MC and other sampling schemes like importance sampling? The general case is going to be hard because it’s only the sequences that are meaningful in general, not the individual points! Hence any naming sequence is going to be somewhat arbitrary.

The other notation issue here regards random variables, which are the formal name for everything calculated in transformed parameters and generated quantities. While the parameters parameterize the sample space of our target distribution, random variables are simply deterministic functions of the parameters. A given value of the parameters immediately defines a value for the random variables, or more particular to the probabilistic setting a realization of the parameters immediately defines a pushforward realization of the random variables. One can also think about evaluating a random variable at a given realization of the parameters as evaluation the random variable.

3 Likes

Hey there, broom summer intern here. broom is a package that takes model objects and extracts relevant information into tibbles, much like tidybayes.

We’ve been thinking about standardizing column names in output (one example: tidyverse/broom#291) for a while now.

In an ideal world, we’d love to have column names from tidybayes and broom line up closely. This would make it easier to compare and simultaneously manipulate Bayesian models and the world of models that broom currently tidies, which are mostly frequentist.

broom is fairly old and somewhat bound by backwards compatibility. The good is that we have 162K downloads a month and have potentially set some precedent for column names. The bad is that some of this precedent is unfortunate.

Here’s what we currently do in terms of output column naming:

  • we use term instead of parameter or variable
  • we use estimate instead of value for the numerical value of a coefficient, etc
  • we use .fitted for predictions
  • we use .resid for residuals

There’s also a growing practice of using prefixing columns being added to a matrix with ., as in .fitted and .resid above, especially within the tidyverse.

On the more unfortunate side of things, the intervals in the broom mostly use

  • conf.low
  • conf.high

regardless of whether they are confidence, credible, or some other type of interval. We may have to stick with this for backwards compatibility, although I’m trying to talk Dave into changing these.

In any case, I think standardizing language and arguments for intervals and credible levels is super useful.

2 Likes

Yeah, the question of broom is definitely a hard and important one to consider. I don’t want to speak for him, but when we talked I think @jonah was leaning away from adopting broom terminology when it does not match up well with the Bayesian world, a viewpoint I sympathize with. Even though tidybayes has adopted some of that terminology, I can see potential confusion in it and actually regret adopting it a little bit.

Currently tidybayes output is compatible with several of the names in broom, specifically: term / estimate, conf.low, and conf.high. One option is to go completely broom-compatible and stick to all of that — although I am a little resistant to changing the prediction output to .fitted, as that feels like a step backwards (and that one is easy to change in output anyhow).

I see on the issue you linked that tidy.stanreg and tidy.brmsfit use lower and upper instead of conf.low and conf.high — I would actually consider switching to those (unless those functions are switching away from that for greater consistency with the rest of broom?).

I guess it is a question of what use cases one is optimizing for: if the use case of comparing estimates against frequentist models is common, that makes a case from broom-name-consistency. If it is not, it is easier to argue for abandoning the broom terminology where that seems to be the best decision and keeping terminology more consistent with other Bayesian packages in order to reduce friction there (if one might expect more users to be inhabiting that intersection).

That said, even if broom terminology is not fully adopted, one way of helping reduce friction would be a “translator” function that can be dropped into a tidy pipeline to convert to broom-like names. That way compatibility is maintained for folks that want it, in a bit more of an opt-in manner.

The other option is to just admit that trying to name things to satisfy both worlds is a fools errand, abandon all hope, and keep things mostly as they are with some small tweaks around the edges (like adding a draw column that at least uniquely indexes each draw).

broom isn’t totally inflexible at the moment either. It hasn’t been maintained for about two years and we’re redoing the column names for the next release to be more consistent. So there’s potential to rename within broom.

I agree the lower and upper are the most generic and would be ideal for intervals. If y’all end up going with those I’ll definitely use that as part of a pitch to get broom to transition to those as well.

Currently we’re identified the tidiers for Bayesian models as being incompatible with the nomenclature in the rest of broom. We haven’t decide what to do with them other then to eventually get it all into alignment.

This is somewhat complicated by these methods being moved to the broom.mixed package in an upcoming release. @mjskay it might be worthwhile to open an issue there to see if there’s any duplication of effort between broom.mixed and tidybayes.

Lastly, if the Stan packages are unifying their nomenclature, I think it may be worth thinking about a broader community guide for naming arguments and output columns in modelling functions. The current state of modelling in R is somewhat terrifying, with small blissful oases like Stan and a few others. RStudio in particular is starting to think more about “tidy modelling” (see the very recent tidymodels Github organization for example).

If Stan and RStudio came together to issue a jointly approved list glossary of argument and output column names, I think it could go a long way toward making modelling in R generally nicer. If you’re open to the idea, let me know and I’ll pitch the RStudio team working on tidymodels.

Apologies for pulling this conversation so far off track. I really appreciate what you’re doing and enjoy keeping an eye out on the Stan ecosystem!

2 Likes

@alexpghayes I think complicating this is probably okay at this stage :).

In fact, because this has gotten more complicated, I’ve created a Google spreadsheet summarizing what I have gathered from the current state of the conversation. I’ve attempted to provide more precise definitions for the terms at play (thanks @betanalpha, your comment helped a lot!), as well as what I was able to determine are the names used in a few packages.

It is publicly editable, and I have some asks from anyone who is willing:

  1. Please double-check for correctness. I may have missed some names from some packages (@jonah @tjmahr @alexpghayes), or used imprecise definitions for some concepts (@betanalpha).

  2. If you have other proposed names, or examples of names from other packages please add them.

  3. If you have preferences, please add a column with your name (like the one I added) with your preferences and/or comments.

Hopefully this will help us move towards more consistency, or at least help us identify (and be aware of) where we can be consistent and where we might diverge.

1 Like

Thanks for the input on the spreadsheet @alexpghayes and @betanalpha.

I’ve added another column there that attempts to take both of your feedback and come up with a proposed set of names based on the following criteria:

  1. Ignore compatibility with broom names on the assumption an adapter function can be created.
  2. Use names that could nevertheless be compatible with frequentist approaches.
  3. Always precede with “.” to avoid collisions with variable names in models.
  4. No abbreviations (remembering if something is abbreviated or not can be a pain).
  5. No two-word names (multi-word names can always be standardized on and used in documentation, but I think data frame output should be succinct).
  6. Names should be nouns (I made an exception for lower/upper because they are common)

Combining those criteria with the feedback provided fairly straightforwardly led me to the proposal below (I’ve included definitions of some terms below where it might be ambiguous; precise definitions for all terms are in the spreadsheet):

  • .chain
  • .iteration - ordered, MCMC-specific, within-chain index
  • .draw - not necessarily ordered, unique index
  • .variable
  • .value - for pushforward realizations of parameters
  • .value - outputs of posterior_linpred, which Michael points out are also pushforward realizations of parameters
  • .prediction - outputs of posterior_predict
  • .lower
  • .upper
  • .level .width (amended at @tjmahr’s suggestion) - probabilities associated with intervals
  • .point - taking values like “mean”, “median”, or “mode”
  • .interval - taking values like “qi” (quantile interval) or “hdi” (highest-density interval)

While this is more aggressive in terms of compatibility with other packages than my previous proposal, I don’t think it is that problematic. A translation function onto broom terminology would be one-to-one:

  • .variable -> term
  • .value -> estimate
  • .prediction -> .fitted
  • .lower -> conf.low
  • .upper -> conf.high

So a drop-in adapter for pipelines would work.

Thoughts?

2 Likes

This looks good!

I prefer .width over .level because level makes me think of alpha/significance level.

1 Like

Nicely done.

1 Like

Fair point, I amended the proposal

I very much like your latest proposal @mjskay!