"draw" vs. "sample" and warmup vs. sampling iterations

Andrew and Michael:

You two should decide on the terminology we’re going to
use for what Andrew’s been encouraging me to call a “draw”.

We have parameters that show up in each warmup iteration
and each sampling iteration.

Andrew corrects me every time I use “sample” to
describe a single draw. I know the literature does
this everywhere and that “sample” is, in practice, ambiguous
between a draw and a sequence of draws.

I don’t care what the answer is, but I want to be consistent
across our writing and our interfacing if we can compromise
on something agreeable to everyone.

  • Bob

P.S. Andrew — I cc-ed you this, but you probably want to turn
on Discourse email notification. No, I don’t recall how to do it.

1 Like

+1 It would be great to come to a consensus on this. Currently I try to use Andrew’s convention in the doc for our R packages when I remember, but I can change that if necessary.

Hi, Bob. I do have Discourse email notification now!

Here’s my suggestion:
Each iteration of HMC has some number of steps. Thus we can talk about the number of steps in an HMC iteration, and also the number of iterations, and also the number of chains.

I used to prefer the term sequence but I guess now I’m ok with chains, since everyone uses the term.

Warm-up is the right term for the iterations we do while adapting. I don’t have a favored term for post-warm-up iterations. We could call them saved iterations unless someone has a pithier term. In the previous email it sounds like Mike wants Stan runs to be characterized by # warm-up iterations and # saved iterations, rather than # warm-up iterations and # total iterations which is how rstan currently does it. I’m happy to do it either way. But, thinking about it, I see the appeal of Mike’s recommendation, in that it’s convenient if I run 4 chains with 1000 saved iterations each, that I now have 4000 saved iterations. That’s more convenient than the current version where we have to remember to divide by 2.

I consider the set of saved iterations to be a sample. A single one of these saved iterations is a draw, but I’d be happy to just call it a saved iteration.

Does this help?
A

I also prefer number of warmup iterations and number
of post-warmup iterations as parameters.

I’ve been sticking to your terminology, but the current
argument names in CmdStan are

  • “num_warmup” for the number of warmup iterations, and

  • “num_samples” for the number of post warmup iterations

All of this is for a single chain.

We might be able to add an alternative to and then
deprecate “num_samples”.

  • Bob

rstan has iter and chains, but I’ve always preferred something like n_iter and n_chains. But is “num_” more of a standard in the CS world?
For cmdstan we could do:
num_iter_warmup
num_iter_postwarmup
num_chains
(If it were up to me, I think I’d prefer “n_” to “num_”, but I will defer to whatever is standard.)

I don’t think we need to indicate the type on the variable
name with an “n_” or “num_” prefix. They just suck up
space and aren’t used conventionally in R itself (at least
as far as I know).

To me, it looks like writing “sd_sigma” or “std_dev_sigma”.

  • Bob

Okay, technically there are a few different concepts being thrown around here.

Samples are any sequence of states from the target sample space, {q_0, q_1, …, q_N} such that the empirical average \hat{f} = (1/N + 1) sum_{n = 0}^{N} f(q_n) converges to the true expectation, E_pi [ f ]. We can have independent samples where each element of the sequence is independent of the others, in which case we can think about the samples as an unordered collection. Alternatively we can have correlated samples where the sequence order matters. Note that any given element from the sequence need not have any nice properties – it’s just the sequence itself that matters.

A Markov chain uses a Markov transition to generate just such a sequence – N transitions of the chain yield a sequence of size N. If the Markov transition preserves a probability distribution then this sequence can be interpreted as correlated samples from that distribution.

If we are considering implementing the Markov chain algorithmically, then each transition defines one iteration of the algorithm.

So samples refer a sequence of states, transitions refer to what generates that sequence, and iterations refers to the algorithm that applies multiple transitions to generate an entire Markov chain.

So all of the naming conventions being thrown around are “correct” in their own way. I would be most happy using “samples” or “transitions” as what we keep, with “warmup” referring to what we throw away to improve the convergence of the sequences. We can also try to come up with a useful adjective for the samples that we keep. “Active”? “Functional”? “Target”? “Equilibrium”?

The really big issue here, however, is what the user sets. In CmdStan the number of warmup transitions and kept transitions independently as those two phases have very different properties. In RStan the total number of transitions is specified, my guess is that was motivated by wanting to specify the overall run time.

After warmup the samples should be “warm samples” (and smooth but not fuzzy) ;-)

And then the real comment: “target samples” sounds good as they are samples from the target distribution.

Aki

  1. Please don’t use “sample” to refer to an individual iterations. That’s just confusing. We can say warm-up iterations and post-warmup iterations. The set of post-warmup iterations is the sample that we use for posterior inference.

  2. I too was thinking of “target sample” but I don’t like this because, if convergence is poor, the post-warmup iterations are not necessarily from anything close to the target distribution. For similar reasons I don’t like “equlibirum.”

Again, it’s tricky, because unless the samples are independent then it doesn’t really make sense to talk about which of them “come from the target distribution”. It’s the entire sequence that targets the distribution. The visual analogy of converging to and then exploring the typical set requires geometric ergodicity and even then is a metaphor for the abstract mathematics.

But maybe it’s easier to not get too pedantic and pick something that sounds reasonable.

@betanalpha, I think @andrewgelman wants you to replace “samples” with “draws.”

But “draw” is just an immediate synonym for “sample” that doesn’t add any clarification. All of the confusion about sample carries over to draw. Really the only benefit is that draw isn’t used as much so “it can be defined how we want it to be defined”, an argument that I absolutely hate in mathematics/statistics/whatever.

I actually said “close to the target distribution”!

Andrew distinguishing between “draw” and “sample” where they aren’t
synonyms. He’s defining “sample” as the collection and “draw” as the item.

Yes. I think “sample” is the collection. I’m happy using “iteration” as the item. But the “iterations” become “draws” once they’re taken out of sequence and are used to summarize the posterior.