Hi, sorry for not getting to you earlier, your question is relevant and well written.
There are IMHO two misunderstandings here:
-
Stan doesn’t “draw” from distributions when sampling. A Stan program is (if we ignore the generated quantities
block for now) fundamentally a function that takes data and unconstrained parameters as input and returns the posterior log density (up to a constant) as a result and computes all the derivatives of the density with respect to the parameters. Nothing more, nothing less. The Stan sampler (or optimize/ADVI) than calls this function and do their thing with the results. But there are no “draws”, just computing a function.
-
When transforming from unconstrained to constrained space, Stan automatically adds the Jacobian of the transform which ensures that the model behaves as if you actually computed the density starting with already constrained parameters.
EDIT: Previous version contained an error in the interpretation.
So in some sense, the code computes the density of a joint distribution over unconstrained vectors of size K that has the peculiar property that after transforming to ordered vectors, it gives the same distribution as if we drawn i.i.d vectors from normal(0, 10)
and kept only those that are ordered. Alternatively, the same distribution could be in this case created by drawing i.i.d vectors from normal(0, 10)
and sorting them.
A quick example showing the distributions are the same in cmdstanr:
sc = "
data {
int<lower=1> K;
}
parameters {
ordered[K] x; // locations of mixture components
}
model {
x ~ normal(0, 10);
}
"
m <- cmdstan_model(write_stan_file(sc))
K = 5
res <- m$sample(data = list(K = K), refresh = 0)
res$cmdstan_summary()
# Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat
# x[1] -1.2e+01 0.15 6.4 -22 -1.1e+01 -1.6 1765 4193 1.0
# x[2] -5.0e+00 0.097 5.5 -14 -5.0e+00 3.8 3253 7726 1.0
# x[3] -6.2e-02 0.085 5.3 -9.0 -1.6e-03 8.5 3936 9348 1.0
# x[4] 4.9e+00 0.085 5.6 -4.3 4.8e+00 14 4405 10464 1.00
# x[5] 1.1e+01 0.10 6.8 0.84 1.1e+01 23 4462 10598 1.00
```r
test <- matrix(rnorm(4000*K, sd = 10), nrow = 4000, ncol = K)
test_ord <- matrix(apply(test, MARGIN = 1, FUN = sort), nrow = 4000, ncol = K, byrow = TRUE)
for(k in 1:K) {
print(quantile(test_ord[,k], probs = c(0.05,0.5, 0.95)))
}
# 5% 50% 95%
# -23.359041 -11.378643 -1.339266
# 5% 50% 95%
# -14.423358 -4.898484 3.855805
# 5% 50% 95%
# -8.99006986 0.03509366 8.96703966
# 5% 50% 95%
# -4.145248 4.783120 14.412439
# 5% 50% 95%
# 1.386047 11.293186 23.222455
test2 <- matrix(rnorm(1e6*K, sd = 10), nrow = 1e6, ncol = K)
ordered_indices <- !apply(test2, MARGIN = 1, FUN = is.unsorted)
if(sum(ordered_indices) < 1000) {
message("Too few orderd left")
}
test_ord_reject <- test2[ordered_indices,]
for(k in 1:K) {
print(quantile(test_ord_reject[,k], probs = c(0.05,0.5, 0.95)))
}
# 5% 50% 95%
# -23.35957 -11.21584 -1.14037
# 5% 50% 95%
# -14.294742 -4.698611 4.024895
# 5% 50% 95%
# -8.8831785 0.1206282 8.8272902
# 5% 50% 95%
# -3.876497 4.842877 14.229799
# 5% 50% 95%
# 1.238684 11.276357 23.249097
Hope that clarifies more than confuses! :-)