“Random variable” is perhaps the most overloaded and unhelpful term in statistics and in my humble opinion should be avoided at all costs. The confusion in the documentation is largely due to implicit assumptions and overloading of meanings, but there’s not much that can be done without changing the block names and/or requiring more probability theory background from users.
At its most abstract level a Stan program defines a joint probability density function over the product of two spaces which I will denote Y and \Theta. In general Y is itself a product space comprised of many component spaces represented by variables defined in the data
or transformed data
block but, and confusingly, not necessarily all of the variables declared in those blocks. In other words we have
y = (y_1, \ldots, y_N)
where the variables on the right hand side include some but not necessarily all of the variables defined in the data
and transformed data
blocks. On the other hand \Theta is also generally product space, but one comprised of component space represented by all of the variables defined in the parameters
block,
\theta = (\theta_1, \ldots, \theta_I).
The algorithms library then take this joint density function and partially evaluate it by binding y_1, \ldots, y_N to values specified by the external interface, yielding something proportional to a conditional distribution,
\pi( \theta_1, \ldots, \theta_I \mid \tilde{y}_1, \ldots, \tilde{y}_N) \propto \pi(\tilde{y}_1, \ldots, \tilde{y}_N, \theta_1, \ldots, \theta_I ).
This unnormalized conditional density function is then used to inform estimates of conditional expectation values.
The problem with transformations is that if f : x\mapsto z then for any density function
\pi(z) \ne \pi( f(x) );
instead we need a Jacobian determinant correction. In the context of a Stan program transformations of any of the variables in Y or \Theta technically require a correction. If f : y_1 \mapsto z_1 then
\pi(z, \ldots, y_N, \theta_1, \ldots, \theta_I ) \ne \pi(f(y_1), \ldots, y_N, \theta_1, \ldots, \theta_I )
and if g : \theta_1 \mapsto \eta_1 then
\pi(y_1, \ldots, y_N, \eta_1, \ldots, \theta_I ) \ne \pi(y_1, \ldots, y_N, g(\theta_1), \ldots, \theta_I ).
All of this is pretty basic probability theory, but does requires understanding the subtleties of probability density functions and product spaces.
These corrections, however, are not so symmetric once we partially evaluate on the y_n variables. In this case the Jacobian corrections become constants – ignoring them does not change the behavior of the resulting unnormalized conditional density function. In other words the reason that one has to correct for “parameters” but not “data variables” is not a generic property of Stan programs but rather an accident of how Stan programs are used.
The second source of confusion is in the names of the blocks. Generally the spaces Y and \Theta can be used to implement all kinds of useful behavior. For prior predictive analyses where we don’t want to condition on anything Y would be empty and \Theta would correspond to the product of the observational and model configuration spaces. For posterior analyses we could take Y to the observational space and \Theta to be the model configuration space in which case the automatic partial evaluation gives an unnormalized posterior density function. Of course the blocks take their names from this last particular application, which makes it very confusing when trying to build a Stan program with any other interpretation.
In other words the block naming convention obscures the full potential of the language in an attempt to prioritize one particular application on which most users focused. More generic names could make the general probabilisitic structure more clear, but also require more probability theory understanding from users. There are definitely design constraints on both sides here, not to say that the current names are the best compromise.
For example to @vasishth’s point the term parameter
is sometimes used to describe variables taking values in a component of any product space; i.e. the variables “parameterizing” the product space. This particular interpretation would indeed be applicable equally well to the “Y” and “\Theta” spaces, both generally and when those spaces implicitly represent an observational space and model configuration space. The Stan block names assume that the term parameter
refers to only variables that parameterize the model configuration space, which to be fair is sloppy terminology at best.