Ragged array of simplexes

thanks @bgoodri - it’s really helpful to see when these things work and how they break down (i.e. statistically valid but slow, or introducing biased).

Out of curiosity, I coded up the identifiable model

data {
  int<lower=2> K;
  int<lower=0,upper=2147483647> x[K];
}
parameters {
  vector<lower=0>[K-1] gamma;
}
transformed parameters {
  real sum_gamma = 1 + sum(gamma);
  vector[K] pi = append_row(1,gamma) / sum_gamma;
}
model {
  target += -K *log(sum_gamma);
  target += dirichlet_lpdf(pi | rep_vector(1,K));
  target += multinomial_lpmf(x | pi);
}

which has comparable performance:

        mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
gamma[1]    2.00    0.00 0.00   2.00   2.00   2.00   2.00   2.00  1234    1
gamma[2]    3.00    0.00 0.00   3.00   3.00   3.00   3.00   3.00  1378    1
gamma[3]    4.00    0.00 0.00   4.00   4.00   4.00   4.00   4.00  1350    1
sum_gamma  10.00    0.00 0.00  10.00  10.00  10.00  10.00  10.00  1171    1
pi[1]       0.10    0.00 0.00   0.10   0.10   0.10   0.10   0.10  1171    1
pi[2]       0.20    0.00 0.00   0.20   0.20   0.20   0.20   0.20  2797    1
pi[3]       0.30    0.00 0.00   0.30   0.30   0.30   0.30   0.30  4000    1
pi[4]       0.40    0.00 0.00   0.40   0.40   0.40   0.40   0.40  4000    1
lp__      -38.69    0.03 1.16 -41.81 -39.26 -38.39 -37.84 -37.36  1851    1

but is nearly 1000 times faster:

sum(sapply(post.2@sim$samples,function(z) attr(z,‘elapsed_time’)))
[1] 0.33933
sum(sapply(post@sim$samples,function(z) attr(z,‘elapsed_time’)))
[1] 275.2609

I wonder if there would be situations where the underdefined solution would perform better…

3 Likes