Finite mixture ordered vectors

I’ve a fundamental question regarding the finite mixture chapter of the Stan user guide:

In this section, we have an ordered ‘x’ vector as follows to allow identification of the group parameters:

parameters {
...
  ordered[K] x;             // locations of mixture components
 ...
}
model {
...
  x ~ normal(0, 10);
...
}

However, in another section of the Stan manual, it states that unconstrained vectors y are transformed into ordered sequences x by defining each element y_k as the log of the difference between successive elements of x.

My questions are:

  1. Wouldn’t doing the above transform imply that x is no longer drawn from the normal distribution that we specified? Rather, it is the log differences that are drawn from the normal distribution? Or is there something that I’m missing here?
  2. Would Stan be able to discard draws that are not ordered and only retain draws that are ordered, thereby circumventing the above transformations?

Thanks in advance!

1 Like

Hi, sorry for not getting to you earlier, your question is relevant and well written.

There are IMHO two misunderstandings here:

  1. Stan doesn’t “draw” from distributions when sampling. A Stan program is (if we ignore the generated quantities block for now) fundamentally a function that takes data and unconstrained parameters as input and returns the posterior log density (up to a constant) as a result and computes all the derivatives of the density with respect to the parameters. Nothing more, nothing less. The Stan sampler (or optimize/ADVI) than calls this function and do their thing with the results. But there are no “draws”, just computing a function.

  2. When transforming from unconstrained to constrained space, Stan automatically adds the Jacobian of the transform which ensures that the model behaves as if you actually computed the density starting with already constrained parameters.

EDIT: Previous version contained an error in the interpretation.

So in some sense, the code computes the density of a joint distribution over unconstrained vectors of size K that has the peculiar property that after transforming to ordered vectors, it gives the same distribution as if we drawn i.i.d vectors from normal(0, 10) and kept only those that are ordered. Alternatively, the same distribution could be in this case created by drawing i.i.d vectors from normal(0, 10) and sorting them.

A quick example showing the distributions are the same in cmdstanr:

sc = "
data {
  int<lower=1> K;
}

parameters {
  ordered[K] x;             // locations of mixture components
}
model {
  x ~ normal(0, 10);
}
"

m <- cmdstan_model(write_stan_file(sc))

K = 5
res <- m$sample(data = list(K = K), refresh = 0)

res$cmdstan_summary()

#                     Mean     MCSE  StdDev    5%       50%   95%    N_Eff  N_Eff/s    R_hat
# x[1]            -1.2e+01     0.15     6.4   -22  -1.1e+01  -1.6     1765     4193      1.0
# x[2]            -5.0e+00    0.097     5.5   -14  -5.0e+00   3.8     3253     7726      1.0
# x[3]            -6.2e-02    0.085     5.3  -9.0  -1.6e-03   8.5     3936     9348      1.0
# x[4]             4.9e+00    0.085     5.6  -4.3   4.8e+00    14     4405    10464     1.00
# x[5]             1.1e+01     0.10     6.8  0.84   1.1e+01    23     4462    10598     1.00

```r
test <- matrix(rnorm(4000*K, sd = 10), nrow = 4000, ncol = K)
test_ord <- matrix(apply(test, MARGIN = 1, FUN = sort), nrow = 4000, ncol = K, byrow = TRUE)
for(k in 1:K) {
  print(quantile(test_ord[,k], probs = c(0.05,0.5, 0.95)))
}
#         5%        50%        95% 
# -23.359041 -11.378643  -1.339266 
#         5%        50%        95% 
# -14.423358  -4.898484   3.855805 
#          5%         50%         95% 
# -8.99006986  0.03509366  8.96703966 
#        5%       50%       95% 
# -4.145248  4.783120 14.412439 
#        5%       50%       95% 
#  1.386047 11.293186 23.222455 
test2 <- matrix(rnorm(1e6*K, sd = 10), nrow = 1e6, ncol = K)

ordered_indices <- !apply(test2, MARGIN = 1, FUN = is.unsorted)
if(sum(ordered_indices) < 1000) {
  message("Too few orderd left")
}

test_ord_reject <- test2[ordered_indices,]
for(k in 1:K) {
  print(quantile(test_ord_reject[,k], probs = c(0.05,0.5, 0.95)))
}
#        5%       50%       95% 
# -23.35957 -11.21584  -1.14037 
#         5%        50%        95% 
# -14.294742  -4.698611   4.024895 
#         5%        50%        95% 
# -8.8831785  0.1206282  8.8272902 
#        5%       50%       95% 
# -3.876497  4.842877 14.229799 
#        5%       50%       95% 
#  1.238684 11.276357 23.249097 

Hope that clarifies more than confuses! :-)