Stan’s sampling algorithms assume support on all of \mathbb{R}^N. If you declare a
as real
and then apply a ~ uniform(0, 1)
, there will be problems when steps go outside support—in the worst case, it’ll devolve to an inefficient form of rejection sampling.
A constrained variable in Stan is transformed to something on \mathbb{R}^N, then when Stan runs, it’s inverse transformed back to the constrained space and the log Jacobian determinant correction is implicitly added to the log density. For instance, we log transform positive-constrained variables, which means when we map them back, we add \log \exp(x^{\textrm{unc}}) to the log density, where x^{\textrm{unc}} is the unconstrained parameter. This ensures the distribution of x = \exp(x^\textrm{unc}) is uniform on (0, \infty).
This ensures that the distribution is uniform over the space of values satisfying the constraint.
I’m not sure what you mean by “accumulating log likelihood”. When there are constraints, we add the Jacobian correction to the target density, but it’s not a log-likelihood per se (the term “likelihood” refers to the data sampling distribution viewed as a function of the parameters, i.e., \mathcal{L}(\theta) = p(y \mid \theta) for fixed data y).
The discontinuous derivatives here shouldn’t be a big deal (a) the value won’t differ by that much on either side of the discontinuity—the derivative is either 0 or a small constant, (b) locality in constrained space is similar to that in unconstrained space, by which I just mean that if points are near each other in constrained space, they’ll be near each other in unconstrained space.
There’s no continuity issue from declaring the limits of y to depend on x—we use inverse logit and logit (scaled) to do this, which is smooth.