I need to define M simplexes of different length.
As far as I understand the suggested approach for ragged data is to use a single list of values, with a separate data structure indicating the sizes of each subarray.
I tried the model below, defining pi as vector since its sum should be K and not 1, but it does not work.
What should be the best approach?
thanks in advance!
data {
int<lower=0> N; // number of states
int<lower=0> M; // number of groups
array[M] int size; // size of groups
vector[N] alpha0; // hyperp for state probabilities
}
parameters {
vector[N] pi;
}
model {
{ int pos;
pos = 1;
for(k in 1:M){
segment(pi,pos,size[k]) ~ dirichlet(segment(alpha0,pos,size[k]));
pos = pos + size[k];
}
}
}
size is a Stan function, but you’re using it as a variable.
Also, given that segment(pi,pos,size[k]) ~ dirichlet(segment(alpha0,pos,size[k])); is the same as dirichlet_lupdf(segment(pi,pos,size[k]) | segment(alpha0,pos,size[k])); aren’t you putting the data where the parameters of interest should be and vice versa?
I replaced size with s, but I get the same error.
I don’t understand your comment about data and parameters. alpha0 shoudl be the concentration parameters for the dirichlet distributions, and I want \pi \sim {\rm Dir}(\alpha).
I get
Exception: dirichlet_lpdf: probabilities is not a valid simplex. sum(probabilities) = 1.203633714, but should be 1 (in 'test.stan', line 21, column 6 to column 65)
Sorry, I understand. I had confused myself about dirichlet while trying to make sense of your code. The problem is that your pi variables are not adding up to 1 in each loop, i.e. they are not constrained to be simplexes. Stan is sampling pi as if it is an unconstrained real, so the likelihood of it adding to 1 in any of your loops is pretty close to zero. Your problem is that the size of pi changes on each iteration, which prevents you from defining it as a simplex in the parameter section.
You can try this:
parameters {
vector[N] pi;
}
transformed parameters {
int pos;
pos = 1;
for(k in 1:M){
array[M] simplex[size[k]] pi_slice = segment(pi,pos,size[k]);
pos = pos + size[k];
}
}
model {
int pos;
pos = 1;
for(k in 1:M){
pi_slice[k] ~ dirichlet(segment(alpha0,pos,size[k]));
pos = pos + size[k];
}
}
I think this way it will validate that pi_slice is a simplex before feeding it down to your dirichlet, preventing the error.
The above doesn’t work. The parameters need to be constrained in such a way that they respect the simplex constraint. In this case, I’d declare a tuple of simplexes. See the tuple types section here: https://mc-stan.org/docs/stan-users-guide/matrices-arrays.html
This will unfortunately not work. Arrays in stan must all be of the same types and sizes.
The only workaround would be to perform the parameter transformations from an unconstrained scale to a simplex for each group. The details of the simplex transformation can be found Constraint Transforms.
tuples require that the number of groups and the number dimensions of the simplexes per group be known beforehand, which is not the case in this code. If we have 3 groups with fixed dimensions, N1, N2, N3, we would be able to define the tuple
tuple(simplex[N1], simplex[N2], simplex[N3]) pi;
Which would work. But since the number of groups and the dimensions of simplex can vary across models (dependent on input data), tuples would not be a valid approach.
But only if you wanted to write the model flexibly.
It seems that @d_provasi knows the sizes before hand, so they could work around it by declaring simplexes of known sizes for pi in the model itself, and handle alpha as they already have been. They would, in essence, be doing what they originally wrote their loop over pi to handle. Say the array of sizes is 2, 3, 8, 10, and 12 for the groups, they could hardcode it as:
One option (for the left hand side) is to use a softmax transform for pi_slice. The last post in this thread has a function that does the transformation and automatically performs the Jacobian adjustment.
In theory, you could then do something like:
functions{
vector simplex_constrain_softmax_lp(vector v) {
int K = size(v) + 1;
vector[K] v0 = append_row(0, v);
// Jacobian
target += sum(v) - K * log_sum_exp(v0);
return softmax(v0);
}
}
data {
int<lower=0> N; // number of states
int<lower=0> M; // number of groups
array[M] int sizes; // size of groups
vector[N] alpha0; // hyperp for state probabilities
}
parameters {
vector[N-M] pi_unconstrained;
}
model {
int pos_pi = 1;
int pos_alpha = 1;
for(k in 1:M){
simplex_constrain_softmax_lp(segment(pi_unconstrained, pos_pi, sizes[k] - 1))
~ dirichlet(segment(alpha0,pos_alpha, sizes[k]));
pos_alpha += sizes[k];
pos_pi += sizes[k]-1;
}
}
It may require some slight adjustments if you want to actually use your adjusted simplex parameters for something (perhaps defining them in the transformed parameters and re-using them elsewhere).
Thanks @Christopher-Peterson this works.
I’m not quite sure what is the most efficient way to proceed to only calculate the constrained values once. Right now I’m using simplex_constrain_softmax_lp in the model section for the sampling, and a a different function that does not update target in transformed parameters to calculate the simplex values for reuse.
thanks @Corey.Plate I have not tested this but I imagine it could work.
As @Garren_Hermanus suggested, I think it’s preferable to keep the flexibility of defining the group sizes in the input and not having to modify the code for different inputs.
I’m fairly certain you can define your simplices in the transformed parameters block with the _lp function and it will still increment the target; that should let you avoid defining them multiple times.
For @mjhajharia’s 2023 summer project with @Bob_Carpenter she started experiments on a bunch of different simplex parameterizations that I helped with. @Seth_Axen added a few new transforms, checked the math twice (by hand and with Jax), and put together a pipeline to evaluate the efficiency of the different transforms. The forthcoming preprint will have the derivations.
We included the log versions - the return vector is a log simplex - because you typically can stay on the log scale. In fact, the log_dirichlet should be more numerically stable and faster (assuming we write the derivatives in stan-math).
The transforms are all in _lp functions that you can test and should be able to reuse the form that @Christopher-Peterson suggested.
All the transforms live at:
If you want to stay on the log scale but use the dirichlet you can use this lpdf
real log_dirichlet_lpdf(vector log_theta, vector alpha) {
int N = rows(log_theta);
if (N != rows(alpha))
reject("Input must contain same number of elements as alpha");
return dot_product(alpha, log_theta) - log_theta[N]
+ lgamma(sum(alpha)) - sum(lgamma(alpha));
}
@d_provasi I am working on a similar problem and I wanted to follow up on this and ask if you have been able to implement this solution successfully? I don’t know if I’m fully understanding your suggestion here, but naively implementing it in @Christopher-Peterson’s proposed model is not working for me. What exactly does the model look like in which you were able to define the adjusted simplex parameters in the transformed parameters block?