If one removes the “prior” on `c`

the model becomes

```
parameters {
real a;
real b;
}
transformed parameters {
real c = a + b;
}
model {
a ~ normal(a_mu, a_sigma);
b ~ normal(b_mu, b_sigma);
}
```

In this case `c`

is just a function that maps a point `(a, b)`

to the point `a + b`

.

Now `a`

and `b`

both live in spaces equipped with probability distributions, specified by the probability density functions `normal(a_mu, a_sigma)`

and `normal(b_mu, b_sigma)`

respectively. Consequently we can *push forward* the distributions over the points `a`

and `b`

through the function `c`

to define a probability distribution on the output values. This distribution is completely determined by the distributions on `a`

and `b`

and the function `c`

and so nothing else needs to be specified. If this is a bit too abstract – Stan will sample values of the parameters,

\tilde{a}, \tilde{b} \sim \pi(a, b) = \pi(a) \cdot \pi(b)

and then generate samples from the output space by evaluating `c`

at those samples,

\tilde{c} = c(\tilde{a}, \tilde{b}).

Importantly, Stan does not generate the samples \tilde{a}, \tilde{b} by using exact random number generators. Instead the lines

```
a ~ normal(a_mu, a_sigma);
b ~ normal(b_mu, b_sigma);
```

defines the joint probability density function

\pi(a, b) = \mathcal{N}(a \mid a_{\mu}, a_{\sigma}) \cdot \mathcal{N}(b \mid b_{\mu}, b_{\sigma})

from which correlated samples are generated with a Markov chain. This is important because it clarifies what happens when you try to add a prior density over `c`

.

When you add the line

```
c ~ normal(c_mu, c_sigma)
```

you’re not sampling values for `a, b, c`

independently and then somehow throwing out values that are inconsistent with `c = a + b`

. Instead the line modifies the joint distribution \pi(a, b) in a weird way, trying to compromise between the marginal normal behaviors for `a`

and `b`

and the summed behavior through `c = a + b`

. You will only see the consequences of this change when generating samples through Stan, not when trying to generate them in R with `rnorm`

.

To summarize, a Stan program just defines a joint density over the parameters. Adding sampling statements for transformed parameters just changes this joint density, and often in a weird and counter-intuitive way.

That said, examining the *consequences* of a prior model is a critical part to any robust Bayesian workflow. In this case you could fit the model without the line `c ~ normal(c_mu, c_sigma)`

and plot the distribution for `c`

. If this distribution is inconsistent with domain expertise then the prior model for `a`

and `b`

is insufficient and you can play around with changes to that prior model that ensure better behavior for the distribution of `c`

. In this case, playing around with `a_mu, a_sigma, b_mu, b_sigma)`

until you get something compatible with your domain expertise for `c`

.