How would I go about putting additional constraints on a unit simplex? For instance, if a simplex is w>0, sum(w)=1, then I want Aw=b instead of sum(w)=1.
Below I have my original, simple model. It’s just doing a regression with a constant vector of ones on the left with simplex coefficients.
I want to additionally constrain the the coefficients to such that w’*mu=mu_target. The second model is my failed attempt. I say it fails because while the equality constraints hold, the inequality constraint does not (specifically w_1 can be any real value, not just between 0 and 1).
I spent some time looking at Section 35.6 on Unit Simplexes in the documentation, but I’m not sure what I would need to do precisely to convert that into Stan code (maybe then it would be clearer how to adjust that to take into account the extra constraint).
Also, if there’s some other way to do accomplish this, I’m all ears.
//original model
data {
int<lower=0> N;
int<lower=0> T;
matrix[T, N] X;
}
transformed data {
vector[T] ones_T;
ones_T = rep_vector(1, T);
}
parameters {
simplex[N] w;
real<lower=0> sigma;
}
model {
ones_T ~ normal(X * w, sigma);
}
I’m surprised that Stan isn’t giving you a warning that a Jacobian adjustment may be necessary (see section 22.3 of the Stan language manual) – or is it?
Given that you are still using the simplex type I presume that you mean you want w > 1, sum(w) = 1, and A * w = b. The problem with this is that the additional constraint implies that the solution space for w is on a lower dimensional space. Just how low dimensional depends on the specific structure of A and b. You can work out analytic results in some cases, but you have to be careful about boundary conditions.
The way this will manifest in your model is that the w will be non identified in some directions. Because the w are constrained by the simplex type this won’t lead to the sampler trying to drift off to infinity, but it will lead to narrow posteriors, small step sizes, and probably low E-BFMI warnings.
On your first point, I was thinking in terms of optimization. Aw=b is a common way to write constraints in quadratic optimization and sum(w)=1 could be included as one such constraint. Strictly speaking, I had only wanted one additional constraint (at the moment), so it would be w >1, sum(w) =1, and w’*mu=mu_target.
On your second point, my way to avoid having to identify to derive the solution (from my second post) was to add an additional modelling statement that dot_product(mu, w) ~ normal(mu_target, 0.001);. I was been playing around with this a little since I had posted it and occasionally I will get a warning about a divergence or two, but I’ve been happy with the number of effective samples. The constraint doesn’t always hold though, it doesn’t force w’*mu=mu_target. This hasn’t been a significant enough issue for me to get concerned yet.
Also, I was able to make the modification below where zeroes_T is a vector of T 0s and X_minus_mu is a centered version of X. The effect of this is like I am choosing w that minimizes the variance of X*w and also tries to set w’*mu=mu_target. Pretty cool, IMO. The results look good so far.
I had to replace dot_product(mu, w) ~ normal(mu_target, 0.001);
with target += T * normal_lpdf(dot_product(mu, w) | mu_target, 0.001);
because when $T$ goes to infinity the dot_product gets overwhelmed by the other modelling statement.
Now the constraint holds to within 0.001, though with fewer effective samples (though still reasonable).