@jonah, @tjmahr, @paul.buerkner, and I just had an excellent skype conversation about unifying various aspects of Bayesian interfaces in the R ecosystem (particularly making sure the rstan ecosystem and tidybayes work well together). The easiest first step towards this is to unify naming conventions, which we agreed it would help to get more input on.
While there are a number of things to name (functions, function arguments, output variable names / columns, etc), we decided to focus in the short term on output column names because that is the hardest thing to change in the future (functions and function arguments are easier to deprecate with aliases, changing output formats breaks user code). Since I am going to submit tidybayes to CRAN soon, it would be best to make breaking name changes before that happens to affect the least number of users.
To open up the conversation, here are some possible synonyms for different columns that tend to show up in function output (I’m ignoring meta-data conventions like capitalization or dot-names for the purposes of this conversation):
chain
-
iteration
,draw
,sample
-
parameter
,term
,variable
-
value
,estimate
(as the value of aparameter
,term
, orvariable
) -
value
,estimate
,prediction
,pred
(as the name for a column in a data frame output bytidybayes::add_fitted_samples
, which derives fromposterior_linpred
) -
value
,estimate
,prediction
,pred
(as the name for a column in a data frame output bytidybayes::add_predicted_samples
, which derives fromposterior_predict
)
I think chain
/ iteration
is the dominant terminology in most packages for those terms.
parameter
/ value
is also I think most common, while term
/ estimate
is used analogously in broom—but perhaps not terminology we want to emulate in the Bayesian world. Personally I think value
is a little vague (everything is a value), but parameter
is probably better than term
, so maybe parameter
/ estimate
?
For posterior_linpred
/ tidybayes::add_fitted_samples
type output, I lean estimate
, prediction
, or pred
(again because value
is vague). Jonah also suggested on the call coming up with a new utterance for this.
For posterior_predict
type output, prediction
or pred
seems the most obvious…
For reference, the current terminology in tidybayes is (in order): chain
, iteration
, term
, estimate
, estimate
, pred
. I admit that these choices (particularly term
, which was motivated by compatibility with broom) are not all ideal, and some should likely be changed.
Thoughts?