I am trying to code a model that requires defining ragged arrays of simplexes. But for simplicity, I want to ask how to code a vector transformation that applies the same constraints as a simplex (without using an explicit simplex) and can then be used for a prior distribution parameter?
Background: There have been several previous discussions that I can find on the topic (here, here, here, and here), but I believe that none of these have a clear and complete answer to the problem which is motivating my question. I think part of the confusion might be that there are multiple ways to go about this and the differences and similarities are not obvious. I am relatively new to Stan, and most of the details on this topic, such as the stickbreaking algorithm, are going over my head. It is possible that some of the previous discussions do actually have an answer to my question, but they seem to expect a level of familiarity with simplex transformations that I don’t have. My personal goal here is primarily to create a working model, and then maybe later I will dig into the weeds if necessary — but hopefully a thread combining previous discussions of this topic could also be useful to other users with many of the same questions as myself.
My use case: For the sake of simplicity, I have a working Stan model (based on this paper and available as an R package here) that uses simplex
’s in the prior and as part of a parameter transformation. It could be a little cleaner but anyway this is a good starting point for my use case and I think a useful edge case for this problem of coding vector simplex adjustments.
Stan model using simplexes
// Note that the Metropolis proposals may be rejected during warmup,
// this can be avoided by increasing the step size to 0.1
data {
int<lower=1> n; // number of election units i
int<lower=2> R; // number of candidates in election 1
int<lower=2> C; // number of candidates in election 2
// (this doesn't have to be a simplex but the condition should be met)
array[n] simplex[R] p_r; // marginal proportion for row r in unit i
array[n, C] int<lower=0> N_c; // marginal count for column c in unit i
real<lower=0> lambda_1; // gamma hyperprior shape parameter
real<lower=0> lambda_2; // gamma hyperprior scale parameter
}
parameters {
matrix<lower=0>[R, C] alpha_r;
array[n, R] simplex[C] beta_r; // proportion of row r in column c for unit i
}
transformed parameters {
array[n] simplex[C] theta_c; // estimated marginal proportion for column c in unit i
array[n] matrix[R, C] beta_mat; // matrix of vectors beta_r to be used for transformation
for (i in 1:n) {
for (r in 1:R) {
beta_mat[i, r] = to_row_vector(beta_r[i, r]);
}
theta_c[i] = beta_mat[i]' * p_r[i];
}
}
model {
for (i in 1:n) {
for (r in 1:R) {
alpha_r[r,] ~ gamma(lambda_1, lambda_2);
beta_r[i, r] ~ dirichlet(alpha_r[r,]);
}
N_c[i] ~ multinomial(theta_c[i]);
}
}
Example data:
p_r <- matrix(
c(0.3259669, 0.6740331,
0.3667954, 0.6332046,
0.3578947, 0.6421053,
0.3595166, 0.6404834,
0.3571429, 0.6428571,
0.3814103, 0.6185897,
0.4117647, 0.5882353,
0.3826087, 0.6173913,
0.3631436, 0.6368564,
0.4102564, 0.5897436),
nrow = 10,
ncol = 2
)
N_c <- matrix(
c(42, 139,
63, 196,
51, 44,
131, 200,
226, 236,
74, 238,
10, 7,
163, 297,
190, 179,
18, 60),
nrow = 10,
ncol = 2
)
data <- list(n = 10, R = 2, C = 2, p_r = p_r, N_c = N_c, lambda_1 = 4, lambda_2 = 2)
Main question: How would you replace all of the explicit simplex
’s in this model with vector parameter transformations that apply the same constraints?
Related questions based on previous discussion threads on this topic (and a potential solution):
-
The proposed
_lp
function from @Christopher-Peterson here although promising does not seem to work for my use case because it loses the adjusted simplex parameters. From some tests I’ve performed, it does not seem so simple as usingsimplex_constrain_softmax_lp
in thetransformed parameters
block. There have not been any updates to that thread for a while, although it is implied that one could modify the function and define the adjustment in thetransformed parameters
block to be able to then use the adjusted simplex parameter in themodel
block. I just have not been able to do that successfully. Maybe I am missing something obvious and there is someone who can explain the correct way to do this? -
I have seen the suggestion in two places to use code that was put together by @mjhajharia and @Bob_Carpenter (here and here) but like with (1) I don’t know how I would actually implement these functions in the model. (I am also not quite sure which version is applicable for this use case?) Maybe these functions are in fact the obvious answer to the motivating question and the implementation is trivial, but so far I have not figured it out. How might you go about using these functions? And which function is preferable for my use case?
-
The other option I found was this proposal from @aaronjg which is actually very similar to the proposal I mention in (1), so I thought that it could be a demonstration of how to perform the simplex adjustment in the
transformed parameters
block. I have included the implementation below. The new model does compile and samples successfully, although it consistently throws this exception during warmup, even after increasing the step size:Exception: dirichlet_lpdf: probabilities is not a valid simplex. sum(probabilities) = nan, but should be 1
. I believe that this exception is similar to what you will see from the original model, except that the exception handling is different since I don’t actually usesimplex
in this new version — meaning that it should be safe to ignore I believe. How would you handle this exception? And to what extent do the differences between this approach to simplex adjustment and that suggested in (1) (apart from the question of implementation) matter for my use case?
Similar Stan model but with parameter adjustments instead of explicit simplexes
data {
int<lower=1> n; // number of election units i
int<lower=2> R; // number of candidates in election 1
int<lower=2> C; // number of candidates in election 2
array[n] simplex[R] p_r; // marginal proportion for row r in unit i
array[n, C] int<lower=0> N_c; // marginal count for column c in unit i
real<lower=0> lambda_1; // gamma hyperprior shape parameter
real<lower=0> lambda_2; // gamma hyperprior scale parameter
}
parameters {
matrix<lower=0>[R, C] alpha_r;
array[n, R] vector<lower=0>[C - 1] gamma_r;
array[n] vector<lower=0>[C - 1] gamma_c;
}
transformed parameters {
array[n, R] vector[C] beta_r; // proportion of row r in column c for unit i
array[n, R] real sum_gamma_r;
for (i in 1:n) {
for (r in 1:R) {
sum_gamma_r[i, r] = 1 + sum(gamma_r[i, r]);
beta_r[i, r] = append_row(1, gamma_r[i, r]) / sum_gamma_r[i, r];
}
}
array[n] vector[C] theta_c;
array[n] real sum_gamma_c;
for (i in 1:n) {
sum_gamma_c[i] = 1 + sum(gamma_c[i]);
theta_c[i] = append_row(1, gamma_c[i]) / sum_gamma_c[i];
}
array[n] matrix[R, C] beta_mat;
for (i in 1:n) {
for (r in 1:R) {
beta_mat[i, r] = to_row_vector(beta_r[i, r]);
}
theta_c[i] = beta_mat[i]' * p_r[i];
}
}
model {
for (i in 1:n) {
for (r in 1:R) {
target += -C * log(sum_gamma_r[i, r]);
}
target += -C * log(sum_gamma_c[i]);
}
for (i in 1:n) {
for (r in 1:R) {
alpha_r[r,] ~ gamma(lambda_1, lambda_2);
beta_r[i, r] ~ dirichlet(alpha_r[r,]);
}
N_c[i] ~ multinomial(theta_c[i]);
}
}