Yet, putting integers (such as 0 and 1) on the left-hand side of a sampling statement is not shown earlier in the manual. What does this mean? The description says:

“The Bernoulli statements are just shorthand for adding log \theta and log(1 - \theta) to the log density.” I’m not sure how this follows from the syntax: It seems to me like it is saying “0 is distributed Bernoulli with a probability of theta.”

You’re falling into a conceptual trap that a lot of new Stan users encounter, myself included! The tilde does not really intend to express “is distributed as”, but instead is shorthand for “increment the log probability by the density of the distribution on the right hand side as evaluated at the value on the left-hand side”. There is an alternative syntax that arguably expresses the latter more clearly:

I slightly disagree. It is common to write generative models as

y ~ bernoulli(theta)

which is shorthand notation for

p(y|theta) = bernoulli(y|theta)

with this latter notation it’s more clear what happens when we condition on observed value y=1

p(y=1|theta) = bernoulli(y=1|theta).

This can read as probability of Bernoulli distributed y given theta
when y is observed to have value 1. Usually in Stan observed values
are given in the data block. Now Stan allows giving the observed value
on the left hand side of ~

1 ~ bernoulli(theta),

which could be read as probability of Bernoulli distributed anonymous
variable given theta when this anonymous variable is observed to have
value 1. Although Stan allows this format, it probably would be more
clear to write all these directly as

target += bernoulli_lpmf(1|theta);

which just happens to also go away from more abstract presentation by revealing implementation details of the inference in Stan.

Is there a tutorial that will help me understand this target += *_lpmf() notation better? The ~ notation has always made way more sense to me, but I feel like I’m missing out on some of the joys of Stan by not writing my models by incrementing the log probability. Something that is conceptual with a walk-through of how it works would be much appreciated. Section 5.2 of the manual gets me some of the ways there, but not all the way. I’m familiar with how += works (the same way as in Python, and I lament that R doesn’t have a similar base function), but I hate using stuff without fully understanding the conceptual things behind it.

I’m also unsure of the target() function—is there a place in the reference manual that goes into what it does? I learned Stan by taking a Bayes class that used BUGS and translating everything to Stan, so I’m used to doing things more closely to how BUGS would do it.

which could be read as probability of Bernoulli distributed anonymous
variable given theta when this anonymous variable is observed to have
value 1

(emphasis mine)

Here’s what is stumps me about that: We have already, in the previous line, said to only run this if y[n] == 0, so to me it seems that the probability is always going to be zero, since the only observations are those that have the value 0?

Is it? Not if probability is really about knowledge. Instead, I think we’re using this notion of probability counterfactually, saying something like given the information we have, here’s what we know about distributions. Then when we observe something, that value’s fixed and known, but we counterfactually continue to reason as if it might have been different. And this isn’t even the Bayesian part!

y ~ foo(theta);

is equivalent (roughly speaking) to:

target += foo_lpdf(y | theta);

All the sampling statement does is increment the log density. The slight subtlety is that the ~ form also drops constants that only depend on data.

This is described in the manual, in the JSS paper, etc.

If I get tails from a coin the probability of heads is still non zero

Is it?

Specifically Pr[coin could’ve fallen heads | coin fell tails] so yeah,
counterfactual. Although we’re often thinking of ongoing data collection
in a time-constant system so that implicitly turns into Pr[will fall heads

fell tails]. So… yes it is, conditional probability all the way down.

Not if probability is really about knowledge. Instead, I think we’re using
this notion of probability counterfactually, saying something like given
the information we have, here’s what we know about distributions. Then when
we observe something, that value’s fixed and known, but we counterfactually
continue to reason as if it might have been different. And this isn’t even
the Bayesian part!

markhw:

Is there a tutorial that will help me understand this target += *_lpmf()
notation better?

y ~ foo(theta);

is equivalent (roughly speaking) to:

target += foo_lpdf(y | theta);

All the sampling statement does is increment the log density. The slight
subtlety is that the ~ form also drops constants that only depend on data.

This is described in the manual, in the JSS paper, etc.