Ordered simplex constraint transform

There are a bunch of posts on ordering a simplex see

There’s some more advice in Ragged array of simplexes - #2 by bgoodri.

This is about composing transforms to get the ordering of the simplex. I only tested on some toy problems. Curious if anyone finds that this works better than the previous suggestions. Or if it’s just the same thing as before dressed up in new clothes (adding @betanalpha who may have insight here). People who may be interested are @WardBrian @stevebronder @Bob_Carpenter

The idea is to create a negative ordered vector and pass that into the simplex. The last element in the stick breaking process is distributed across all the elements by proportion to the size of the element.

functions {
    vector ordered_simplex_constrain_lp(vector y) {
    int Km1 = rows(y);
    vector[Km1] x;
    real stick_len = 1.0;
    
    // use only negative y's 
    // start from the smallest -y, which is the last value in the vector 
    for (k in 1:Km1) {
      real adj_y_k = -y[Km1 - k + 1] - log(Km1 - k + 1);
      real z_k = inv_logit(adj_y_k);
      x[k] = stick_len * z_k;
      target += log(stick_len) - log1p_exp(-adj_y_k) - log1p_exp(adj_y_k);
      stick_len -= x[k];
    }
    // this is new, 
    // instead of adding the stick_len to the last element
    // distribute it to all the K - 1 values
    // could also just distribute evenly
    // comment out the x* = 1 + stick_len and just
    // return x + stick_len / Km1;
    x *= 1 + stick_len;
    return x + (1 - sum(x) ) / Km1;
  }
}
data {
  int K;
}
parameters {
  positive_ordered[K] y;
}
transformed parameters {
  simplex[K] x = ordered_simplex_constrain_lp(y);
}
model {
  y ~ exponential(1);
}
2 Likes

Thanks, @spinkney. Is the target += all that’s needed for the Jacobian?

Another way to generate ordered simplexes is to start with ordered vectors then apply softmax. I’m not sure about the Jacobian implications of that. In Stan without the Jacobian, it could look like this for identifiability.

parameters {
  pos_ordered[K - 1] theta;
}
transformed parameters {
  simplex[K] = softmax(append_row(0, theta));
}

You can see the Jacobian for softmax in matrixcalculus.org by evaluating exp(x) / sum(exp(x)) for x a vector and keeping in mind that we have to pin x[1] = 0.

1 Like

What you’ve said above seems to contradict what you said awhile ago (and I don’t mean to grill you! just want to make sure I understand)

Just curious if there’s something that’s changed your mind about this?

On another note, I made the positive_ordered thing negative because it allows big simplexes. If I keep it positive I get log(0) errors with > 30 sized simplexes.

1 Like

Sorry for the confusion. What I didn’t realize the first time is that if you take pos_ordered parameters, then you can drop in 0 as the first element and it will preserve orderedness.

And yes, negating will prevent overflow. But you might have to be careful about underflow/rounding.

2 Likes

Doing the softmax thing is slow (like 100x) and results in divergences. Maybe I’ve implemented the jacobian wrong?

data {
  int K;
}

transformed data{
  vector[K] alpha = rep_vector(1.0, K);
}

parameters{
 positive_ordered[K - 1] prop;
}

transformed parameters {
  simplex[K] prop_soft = softmax(append_row(0, prop));
}

model{
  prop_soft ~ dirichlet(alpha);
  {
    vector[K] t0 = exp(prop);
    real t1 = sum(t0);
    target += log_determinant(diag_matrix(t0) / t1 - t0 * t0' / square(t1));
  }
}
1 Like

I think I need to add target += log1p(stick_len);

1 Like

Hi,
I tried putting your implementation to a test via SBC and it appears that both the original version you posted and the one produced with adding target += log1p(stick_len); is somwhat biased, so maybe the Jacobian is off?

Here’s my code:

ordered_simplex.stan

functions {
 vector ordered_simplex_constrain_lp(vector y) {
    int Km1 = rows(y);
    vector[Km1] x;
    real stick_len = 1.0;
    
    // use only negative y's 
    // start from the smallest -y, which is the last value in the vector 
    for (k in 1:Km1) {
      real adj_y_k = -y[Km1 - k + 1] - log(Km1 - k + 1);
      real z_k = inv_logit(adj_y_k);
      x[k] = stick_len * z_k;
      target += log(stick_len) - log1p_exp(-adj_y_k) - log1p_exp(adj_y_k);
      stick_len -= x[k];
    }
    // this is new, 
    // instead of adding the stick_len to the last element
    // distribute it to all the K - 1 values
    // could also just distribute evenly
    // comment out the x* = 1 + stick_len and just
    // return x + stick_len / Km1;
    x *= 1 + stick_len;
    return x + (1 - sum(x) ) / Km1;
 }
  
 vector ordered_simplex_constrain_v2_lp(vector y) {
    int Km1 = rows(y);
    vector[Km1] x;
    real stick_len = 1.0;
    
    // use only negative y's 
    // start from the smallest -y, which is the last value in the vector 
    for (k in 1:Km1) {
      real adj_y_k = -y[Km1 - k + 1] - log(Km1 - k + 1);
      real z_k = inv_logit(adj_y_k);
      x[k] = stick_len * z_k;
      target += log(stick_len) - log1p_exp(-adj_y_k) - log1p_exp(adj_y_k);
      stick_len -= x[k];
    }
    // this is new, 
    // instead of adding the stick_len to the last element
    // distribute it to all the K - 1 values
    // could also just distribute evenly
    // comment out the x* = 1 + stick_len and just
    // return x + stick_len / Km1;
    x *= 1 + stick_len;
    target += log1p(stick_len);
    return x + (1 - sum(x) ) / Km1;
  }  
}
data {
  int K;
  int<lower=1,upper=2> version;
  int<lower=0> observed[K];
}
parameters {
  positive_ordered[K] y;
}
transformed parameters {
  simplex[K] x;
  if(version == 1 ) x = ordered_simplex_constrain_lp(y);
  else x = ordered_simplex_constrain_v2_lp(y);
}

model {
  x ~ dirichlet(rep_vector(2, K));
  observed ~ multinomial(x);
}

and the SBC code:

library(SBC) # remotes::install_github("hyunjimoon/SBC")
library(cmdstanr)
library(MCMCpack)
library(ggplot2)

library(future)
plan(multisession)
options(SBC.min_chunk_size = 5)

m <- cmdstan_model("ordered_simplex.stan")
backend <- SBC_backend_cmdstan_sample(m, chains = 2)

generate_one_dataset <- function(N, K, version) {
  x_raw <- rdirichlet(1, alpha = rep(2, K))
  x <- sort(x_raw)
  observed <- as.integer(rmultinom(1, size = N, prob = x))
  
  list(
    parameters = list(x = x),
    generated = list(K = K, observed = observed, version = version)
  )
}

datasets_v1 <- generate_datasets(
  SBC_generator_function(generate_one_dataset, N = 30, K = 6, version = 1),
  n_datasets = 500)

res_v1 <- compute_results(datasets_v1, backend)

plot_ecdf_diff(res_v1) + ggtitle("V1")
plot_rank_hist(res_v1)  + ggtitle("V1")
plot_sim_estimated(res_v1, alpha = 0.3)  + ggtitle("V1")

datasets_v2 <- generate_datasets(
  SBC_generator_function(generate_one_dataset, N = 30, K = 6, version = 2),
  n_datasets = 500)

res_v2 <- compute_results(datasets_v2, backend)

plot_ecdf_diff(res_v2) + ggtitle("V2")
plot_rank_hist(res_v2) + ggtitle("V2")
plot_sim_estimated(res_v2) + ggtitle("V2")

The ranks for “v1”:

The ranks for “v2”:

So either I misunderstood how to use the code, or how to simulate an ordered simplex or there is some problem with my code or there is a problem with your code :-)

Hope that’s at least slightly helpful :-)

Cool, could you try with positive ordered constraint only? I just want to know if that looks ok.

Can you then do the softmax ordered suggestion? And then the gamma prior, positive ordered element/ sum(pos ordered) suggestion?

Lastly, can you post with using

return x + stick_len / Km1

Instead of

x *= 1 + stick_len;
return x + (1 - sum(x) ) / Km1;

(Not at a computer, feel free to reformat)

Don’t really have much time for testing in the near future, but thought that this is a nice place to do some promotion of the SBC package :-D Feel free to adapt the code as you need, but I probably will not test a lot of stuff for you (at least not in the upcoming week or so) :-)

All the ordered simplex transforms I mentioned above show the same behavior, though v2 seems to be the ‘best’ of the bunch.

Maybe @hyunji.moon can take a look.

The updated code to perform the 3 additional ordered simplex transforms. I modified the simulation to make the simplex vector smaller, more values from the multinomial, and fewer runs of SBC (for time). The plots are consistent across all the transforms. So I don’t think this is pointing to issues with the jacobian (unless all the jacobians are off):

functions {
 vector ordered_simplex_constrain_lp(vector y) {
    int Km1 = rows(y);
    vector[Km1] x;
    real stick_len = 1.0;
    
    // use only negative y's 
    // start from the smallest -y, which is the last value in the vector 
    for (k in 1:Km1) {
      real adj_y_k = -y[Km1 - k + 1] - log(Km1 - k + 1);
      real z_k = inv_logit(adj_y_k);
      x[k] = stick_len * z_k;
      target += log(stick_len) - log1p_exp(-adj_y_k) - log1p_exp(adj_y_k);
      stick_len -= x[k];
    }
    // this is new, 
    // instead of adding the stick_len to the last element
    // distribute it to all the K - 1 values
    // could also just distribute evenly
    // comment out the x* = 1 + stick_len and just
    // return x + stick_len / Km1;
    x *= 1 + stick_len;
    return x + (1 - sum(x) ) / Km1;
 }
  
 vector ordered_simplex_constrain_v2_lp(vector y) {
    int Km1 = rows(y);
    vector[Km1] x;
    real stick_len = 1.0;
    
    // use only negative y's 
    // start from the smallest -y, which is the last value in the vector 
    for (k in 1:Km1) {
      real adj_y_k = -y[Km1 - k + 1] - log(Km1 - k + 1);
      real z_k = inv_logit(adj_y_k);
      x[k] = stick_len * z_k;
      target += log(stick_len) - log1p_exp(-adj_y_k) - log1p_exp(adj_y_k);
      stick_len -= x[k];
    }
    // this is new, 
    // instead of adding the stick_len to the last element
    // distribute it to all the K - 1 values
    // could also just distribute evenly
    // comment out the x* = 1 + stick_len and just
    // return x + stick_len / Km1;
    x *= 1 + stick_len;
    target += log1p(stick_len);
    return x + (1 - sum(x) ) / Km1;
  }  
  
   vector ordered_simplex_constrain_v3_lp(vector y) {
    int Km1 = rows(y);
    vector[Km1] x;
    real stick_len = 1.0;
    
    // use only negative y's 
    // start from the smallest -y, which is the last value in the vector 
    for (k in 1:Km1) {
      real adj_y_k = -y[Km1 - k + 1] - log(Km1 - k + 1);
      real z_k = inv_logit(adj_y_k);
      x[k] = stick_len * z_k;
      target += log(stick_len) - log1p_exp(-adj_y_k) - log1p_exp(adj_y_k);
      stick_len -= x[k];
    }
    return x + stick_len / Km1;
  } 
  
  vector ordered_simplex_constrain_v4_lp(vector y) {
    int Km1 = rows(y) - 1;
    vector[Km1] t0 = exp(y[1:Km1]);
    real t1 = sum(t0);
    target += log_determinant( add_diag(-(t0 * t0') / square(t1), t0 / t1 ) );

    return softmax(append_row(0, y[1:Km1]));
  } 
  
  vector ordered_simplex_constrain_v5_lp(vector y) {
    int K = rows(y);
    real y_sum = sum(y);
    target += log_determinant( add_diag(rep_matrix(-y / square(y_sum), K), 1 / y_sum) );

    return y / sum(y);
  } 
}
data {
  int K;
  int<lower=1,upper=2> version;
  int<lower=0> observed[K];
}
parameters {
  positive_ordered[K] y;
}
transformed parameters {
  simplex[K] x;
  if(version == 1 ) x = ordered_simplex_constrain_lp(y);
  else if (version == 2) x = ordered_simplex_constrain_v2_lp(y);
  else if (version == 3) x = ordered_simplex_constrain_v3_lp(y);
  else if (version == 4) x = ordered_simplex_constrain_v4_lp(y);
  else x = ordered_simplex_constrain_v5_lp(y);
}

model {
  x ~ dirichlet(rep_vector(1, K));
  observed ~ multinomial(x);
}

The SBC code:

library(SBC) # remotes::install_github("hyunjimoon/SBC")
library(cmdstanr)
library(MCMCpack)
library(ggplot2)

library(future)
plan(multisession)
options(SBC.min_chunk_size = 5)

m <- cmdstan_model("ordered_simplex.stan", force_recompile = T)
backend <- SBC_backend_cmdstan_sample(m, chains = 2)

generate_one_dataset <- function(N, K, version) {
  x_raw <- rdirichlet(1, alpha = rep(1, K))
  x <- sort(x_raw)
  observed <- as.integer(rmultinom(1, size = N, prob = x))
  
  list(
    parameters = list(x = x),
    generated = list(K = K, observed = observed, version = version)
  )
}

  datasets <- list()
  res <- list()
for (i in 1:5) {
  datasets[[i]] <- generate_datasets(
    SBC_generator_function(generate_one_dataset, N = 1000, K = 6, version = i),
    n_datasets = 200)
  
  res[[i]] <- compute_results(datasets[[i]], backend)
  
  plot_rank_hist(res[[i]])  + ggtitle(paste0("V", i))
}

The ranks for “v1”:

The ranks for “v2”:

The ranks for “v3”:

The ranks for “v4”:

The ranks for “v5”:

1 Like

So I tried looking into this today a bit and one thing that I find quite weird is that all of the parametrizations of an ordered K-simplex use K parameters, when in fact we know that the ordered simplex has only K - 1 degrees of freedom. I wouldn’t be surprised if that’s what’s causing trouble as the transformation is no longer 1:1 and the Jacobian is AFAIK undefined in thise case.

I had an idea for a transformation that would respect this, but I couldn’t make it work in the time I had, neither was I able to modify the transformations you mentioned to be 1:1.

The idea I had is that starting from this MO answer: pr.probability - What is the probability distribution of the $k$th largest coordinate chosen over a simplex? - MathOverflow we know that the smallest element of a uniformly distributed K-simplex x_K is distributed as if x_K = \frac{a_1}{n (a_1 + a_2 + ... a_K)} where a_i \sim Exponential(1), equivalently x_K = \frac{a_1}{n (a_1 + g)} where g \sim Gamma(K - 1, 1) and a_1 and g are still independent. Using my friend Wolfram Alpha I can integrate over a_1 and g to get a PDF for x_K - the smallest element:

f_{x_K}(x) = K(K - 1) (1 - Kx)^{K - 2} : 0 < x < \frac{1}{K}

and the CDF is

F_{x_K}(x) = \frac{Kx - 1 + (1 - Kx)^K}{Kx - 1}

And the idea was that I would start with K - 1 values uniformly distributed on [0,1]. I would take the first use the inverse CDF to transform it into the minimum value x_K. This will leave with 1 - K x_K to be distributed among x_1, ... x_{K-1}, which I can do recursively (i.e. I need to multiply this value by a K-1 ordered simplex). The problem is that my friend Wolfram Alpha couldn’t find the inverse CDF…

So sharing in case it stimulates some further progres…

OK, I was overcomplicating things - the PDF I derived actually can be used directly without any worries about inverse CDFs!

Here’s ordered_simplex_min.stan:

functions {
 vector ordered_simplex_constrain_min_lp(vector y) {
    int Km1 = rows(y);
    vector[Km1 + 1] x;
    real remaining = 1; // Remaining amount to be distributed
    real base = 0; // The minimum for the next element
    for(i in 1:Km1) {
      int K_prime = Km1 + 2 - i; // Number of remaining elements
      //First constrain to [0; 1 / K_prime]
      real invlogity = inv_logit(y[i]);
      real x_cons = inv(K_prime) * invlogity;
      // Jacobian for the constraint
      target += -log(K_prime) + log(invlogity) + log1m(invlogity);

      // Add the lowest element log density
      target += log(K_prime - 1) +  log(K_prime) + (K_prime - 2) * log1m(K_prime*x_cons);

      x[i] = base + remaining * x_cons;
      base = x[i];
      //We added  remaining * x_cons to each of the K_prime elements yet to be processed
      remaining -= remaining * x_cons * K_prime; 
    }
    x[Km1 + 1] = base + remaining;

    return x;
 }
}
data {
  int K;
  int<lower=0> observed[K];
  real<lower=0> prior_alpha;
}


parameters {
  vector[K - 1] y;
}

transformed parameters {
  simplex[K] x = ordered_simplex_constrain_min_lp(y);
}

model {
  x ~ dirichlet(rep_vector(prior_alpha, K));
  observed ~ multinomial(x);
}

EDIT: For the record, I probably don’t understand why the code works. I would swear the code is missing a target += log(remaining) correction (for the x[i] = base + remaining * x_cons; line), but adding it makes the SBC fail… Also I can see why it works for the uniform simplex distribution, but I much less clear whether it should work with the added Dirichlet prior… At some point I will learn enough to understand it, but today is not the day.

Simulator and testing code

And here’s our :

First, we’ll test whether this works well as an implicit flat prior over ordered simplices:

library(SBC) # remotes::install_github("hyunjimoon/SBC")
library(cmdstanr)
library(MCMCpack)
library(ggplot2)

library(future)
plan(multisession)
options(SBC.min_chunk_size = 5)

m <- cmdstan_model("ordered_simplex_min.stan")
backend <- SBC_backend_cmdstan_sample(m, chains = 2)

generate_one_dataset <- function(N, K, prior_alpha = 1) {
  x_raw <- rdirichlet(1, alpha = rep(prior_alpha, K))
  x <- sort(x_raw)
  observed <- as.integer(rmultinom(1, size = N, prob = x))
  
  list(
    parameters = list(x = x),
    generated = list(K = K, observed = observed, prior_alpha = prior_alpha)
  )
}

datasets_flat <- generate_datasets(
    SBC_generator_function(generate_one_dataset, N = 20, K = 4),
    n_datasets = 1000)
  
res_flat <- compute_results(datasets_flat, backend)
  
plot_rank_hist(res_flat)
plot_ecdf_diff(res_flat)

plot_sim_estimated(res_flat, alpha = 0.2)


Looking good!

And now let’s try with a concentrated prior:

datasets_6 <- generate_datasets(
    SBC_generator_function(generate_one_dataset, N = 10, K = 4, prior_alpha = 6),
    n_datasets = 1000)
  
res_6 <- compute_results(datasets_6, backend)
  
plot_rank_hist(res_6)
plot_ecdf_diff(res_6)

plot_sim_estimated(res_6)

Also looking good, although we apparently can’t learn much about the parameters:


Note that for this kind of investigation, it is actually useful to have weak likelihood (i.e. low N in the simulator), as we are interested whether the implied prior is correct and the stronger likelihood we have, the lower effect of the prior on our posterior and the harder it is to find a discrepancy with SBC. At the same time I don’t think we would want to have no likelihood at all, as I am also interested whether some problems do not arise in the interaction between the prior and the likelihood.

There is however likely room for improvement as the implied geometry on y can look a bit weird - here’s one of the worse looking pairs plots:

2 Likes

Thanks this is great!

We can follow the derivation another way from https://arxiv.org/pdf/0708.0176.pdf. Where the intensities are distributed uniformly on a N - 1 simplex. They derive the maximum and the minimum order statistic distribution. Where the minimum corresponds to exactly what you have!

The CDF of the minimum is given by

F(s, N) = 1 - \frac{(N-1)!}{(2 \pi)^N} \prod_{i=1}^N \bigg [\int_0^{2 \pi} d \theta_i \int_s^1 dr_i^2 \bigg ] \delta \bigg(\sum_{j=1}^N r^2_j - 1 \bigg )

The evaluation of which gives

F(s, N) = 1 - (1 - Ns)^{N - 1} \; \; \; \; \text{ for } 0 \le s \le 1/N

The pdf then is

f(s, N) = N (N - 1) (1 - Ns)^{N - 2}

I think there’s no jacobian for remaining b/c it’s part of x and not the unconstrained y. It gets subsumed into the lower bound for the next iterate of x.

This is the exact same thing you have just rearranged to show that it’s an upper and lower bound transform.

I’ll add this as a stan-math issue to get this transform in.

 vector ordered_simplex_constrain_min_lp(vector y) {
    int Km1 = rows(y);
    vector[Km1 + 1] x;
    real remaining = 1; // Remaining amount to be distributed
    real lb = 0; // The minimum for the next element
    real ub;
    real xcons;
    real invlogity;
    
    for(i in 1:Km1) {
      int K_prime = Km1 + 2 - i; // Number of remaining elements
      //First constrain to [0; 1 / K_prime]
      ub = 1.0 / K_prime;
      invlogity = inv_logit(y[i]);
      xcons = ub * invlogity;
      target += log(ub - lb) + log(invlogity) + log1m(invlogity);

      // Add the lowest element log density
      target += log(K_prime - 1) +  log(K_prime) + (K_prime - 2) * log1m(K_prime * xcons);
      
      x[i] = lb + remaining * xcons;
      lb = x[i];
      //We added  remaining * x_cons to each of the K_prime elements yet to be processed
      remaining -= remaining * xcons * K_prime; 
    }
    x[Km1 + 1] = lb + remaining;

    return x;
 }
2 Likes

It looks different… And it doesn’t pass SBC (here’s with 1000 simulations, flat prior):

1 Like

there’s some difference, maybe someone can find the mistake. Anyway, not enough time right now.

edit: Needed 10 mins.

vector ordered_simplex_constrain_min_lp(vector y) {
    int Km1 = rows(y);
    vector[Km1 + 1] x;
    real remaining = 1; // Remaining amount to be distributed
    real lb = 0; // The minimum for the next element
    real ub = 1;
    real xcons;
 
    for(i in 1:Km1) {
      int K_prime = Km1 + 2 - i; // Number of remaining elements
      //First constrain to [0; 1 / K_prime]
      ub = inv(K_prime);
      xcons = ub * inv_logit(y[i]);
      target += log(ub) + log_inv_logit(y[i]) + log1m_inv_logit(y[i]);

      // Add the lowest element log density
      target += log(K_prime - 1) +  log(K_prime) + (K_prime - 2) * log1m(K_prime * xcons);
      
      x[i] = lb + remaining * xcons;
      lb = x[i];
      //We added  remaining * x_cons to each of the K_prime elements yet to be processed
      remaining -= remaining * xcons * K_prime; 
    }
    x[Km1 + 1] = lb + remaining;

    return x;
 }

edit 2: previous plots were not with the same priors as above

1 Like

Hi,
just returning to this topic as I was looking at this for another reason. I noticed that the way to make the softmax approach suggested by @Bob_Carpenter work is to note that I actually want to compute the Jacobian for transforming the positive_ordered vector to the first/last K-1 elements of the simplex - the remaining element of the simplex is then simply 1 - sum(rest) which is a transformation with constant Jacobian which can be dropped.

The log determinant of the K-1 \times K - 1 Jacobian matrix then has an analytical form (I used Mathematica to help me with it). The full code is then:

data {
  int K;
  array[K] int<lower=0> observed;
  vector<lower=0>[K] prior_alpha;
}


parameters {
  positive_ordered[K - 1] y;
}

transformed parameters {
  simplex[K] x = softmax(append_row(0, y));
}

model {
  x ~ dirichlet(prior_alpha);
  // The required log Jacobian determinant
  target += sum(y) - K * log_sum_exp(append_row(0, y));
  // Likelihood, because why not
  observed ~ multinomial(x);
}

I verified that it works with SBC (code almost identical as the above cases)
In quick tests it appears to be slightly slower, less stable and inducing worse geometry than the variant I proposed in a previous post (Ordered simplex constraint transform - #14 by martinmodrak) but I thought someone may find this interesting.

The problem in the code tested by @spinkney is that the full K \times K Jacobian is not full rank and has determinant 0, which explains the numerical issues.

1 Like

Hi @martinmodrak, thanks for adding this. I’m curious what application are you using this for?

As an example for SBC :-D