If one removes the “prior” on c
the model becomes
parameters {
real a;
real b;
}
transformed parameters {
real c = a + b;
}
model {
a ~ normal(a_mu, a_sigma);
b ~ normal(b_mu, b_sigma);
}
In this case c
is just a function that maps a point (a, b)
to the point a + b
.
Now a
and b
both live in spaces equipped with probability distributions, specified by the probability density functions normal(a_mu, a_sigma)
and normal(b_mu, b_sigma)
respectively. Consequently we can push forward the distributions over the points a
and b
through the function c
to define a probability distribution on the output values. This distribution is completely determined by the distributions on a
and b
and the function c
and so nothing else needs to be specified. If this is a bit too abstract – Stan will sample values of the parameters,
\tilde{a}, \tilde{b} \sim \pi(a, b) = \pi(a) \cdot \pi(b)
and then generate samples from the output space by evaluating c
at those samples,
\tilde{c} = c(\tilde{a}, \tilde{b}).
Importantly, Stan does not generate the samples \tilde{a}, \tilde{b} by using exact random number generators. Instead the lines
a ~ normal(a_mu, a_sigma);
b ~ normal(b_mu, b_sigma);
defines the joint probability density function
\pi(a, b) = \mathcal{N}(a \mid a_{\mu}, a_{\sigma}) \cdot \mathcal{N}(b \mid b_{\mu}, b_{\sigma})
from which correlated samples are generated with a Markov chain. This is important because it clarifies what happens when you try to add a prior density over c
.
When you add the line
c ~ normal(c_mu, c_sigma)
you’re not sampling values for a, b, c
independently and then somehow throwing out values that are inconsistent with c = a + b
. Instead the line modifies the joint distribution \pi(a, b) in a weird way, trying to compromise between the marginal normal behaviors for a
and b
and the summed behavior through c = a + b
. You will only see the consequences of this change when generating samples through Stan, not when trying to generate them in R with rnorm
.
To summarize, a Stan program just defines a joint density over the parameters. Adding sampling statements for transformed parameters just changes this joint density, and often in a weird and counter-intuitive way.
That said, examining the consequences of a prior model is a critical part to any robust Bayesian workflow. In this case you could fit the model without the line c ~ normal(c_mu, c_sigma)
and plot the distribution for c
. If this distribution is inconsistent with domain expertise then the prior model for a
and b
is insufficient and you can play around with changes to that prior model that ensure better behavior for the distribution of c
. In this case, playing around with a_mu, a_sigma, b_mu, b_sigma)
until you get something compatible with your domain expertise for c
.