# How to achieve a weighted simplex (simplex generalization)?

The problem is simple.

I want to have a parameter x with the constraint that

x * w' = 1 for some fixed, constant and strictly positive vector w, i.e. for each i w[i] > 0.

The existing Stan’s simplex would be a special case with w[i] = 1.

How to achieve such thing in Stan?

(I’ve asked the same question on Stack Overflow, which supports Mathjax. I will keep both places in sync).

Could you please clarify: is x a vector as well? Is the sum of data vector w in some way naturally constrained (eg to 1 because they are normalised weights)?

Most likely you will have to work with a simplex vector and use that as a power to a function of w and some auxiliary variable x_aux. Happy to help and think more about this, but please clarify first.

Wait, maybe it’s just this easy:

Define an auxiliary simplex vector x_aux that you sample however you like. And then just set x_i = x_{aux,i} / w_i. Would that serve your needs?

1 Like

is x a vector as well?

Yes; x is a weighted simplex I am looking for.

Is the sum of data vector w in some way naturally constrained?

No. The only constraint on w in my problem is that each element is positive.

Does @LucC’s second post not give you what you want for some reason?

I suspect it is not good enough, but I have to think about it. I will return to the problem first thing tomorrow and give you a definite answer.

And thank you, LucC for the help.

It is 1 AM here, and at this time my brain cease working. I will definitely return to the problem in about 10 hours. :-)

Any developments? Does x_i = \frac{x_{aux,i}}{w_i} work for you?

Thank you LucC for your attention, you are very kind.

After another working session I’ve discovered an error in my judgment - I actually don’t need a simplex, just a centered vector that is uniformly (or beta) distributed with each element on a limited differently from both ends by two fixed vectors. And that is a very tricky thing to model, even if there was an option for uniformly distributed centered vectors (but most likely they aren’t, at least not here: 1.7 Parameterizing Centered Vectors | Stan User’s Guide ).

I am starting to think, that MC sampling in this situation would be possible only using Metropolis-Hastings algorithm, not Stan. I believe I would be able to write a manual acceptance rules.

BTW, the root mathematical problem I want to solve is a nonparametric model for two densities with a constraint that when they are mixed using a constant factor, those two densities will yield a fixed, externally input density.
Another factor is that I want to assume apriori shape of those two distributions and model a perturbations over that shape rather than model those two distributions from scratch.

That model will be used in the context of a binary classification problem: I am given an ML model trained on a labelled training set, which outputs a one-dimensional feature that eventually will be used for classification with some threshold.

The training set is not a random sample of the production set, because for various reasons it is way cheaper for us to generate the training set by other means. We try hard to make the training set as similar to the production, as possible, but eventually we have to assume that those two sets significantly differ, in both the balance and the shape of the feature’s distribution for individual ground truths.

My task is to reconstruct the shape of the feature distributions in the non-labeled production set, one for each ground truth label. The model I write will ultimately constraint both densities, in such a way, that their mixture will match the known marginal (production) distribution shape. It will also use the shapes of the distributions in the training set as apriori knowledge. The shape will be fitted with the (relatively small) manually labeled random sample from the production set, and hopefully I would be able to extract the knowledge of density shapes of individual groups in the production.

It’s impossible to get a centered vector with a multivariate independent uniform (or beta) prior (the sum-to-zero constraint violates independence), but to get one with uniform margins you can apply the solution for standard normal margins in the Stan Users Guide that you linked above, and then transform each element by the f(g(x)) where g is the standard normal CDF and f is the inverse CDF of the distribution that you want to have for the margin.

Again, this prior will involve a somewhat complicated and non-obvious dependency structure between the elements, and you’ll want to be very careful that you aren’t accidentally putting negligible prior density over a priori reasonable configurations.