# Applying Linear constraints to transformed parameters?

Hello all. This is a subsequent post regarding a Bayesian subset model (but this is a more general question).

So, I want to do independent beta regression on a number of parameters, and have a specific linear combination of them form a simplex.

For example, assume
v = (v_1,v_2,v_3,...,v_7) and A is a matrix in \mathbb{R}^{8x7}, then I want Av to form a simplex.

I am putting beta priors on the individual elements of v, but are there other constraints I can impose so that the model has an easier time to initialize?

Thanks

They key to these multivariate constraints that are a function of multiple variables is to reparameterize things such that the whole constraint falls on a scalar and can be satisfied for any value of the parameters. This usually requires normalizing the scale of the parameters. For example, you could do something like

parameters {
simplex[7] v_; // provisional v
}
transformed parameters {
simplex[7] Av = A * v_;
Av /= inv(sum(Av)); // recalculate to satisfy simplex constraint
}


However, if A is not such that Av is always positive whenever v_ is a simplex, then NUTS will likely not sample efficiently enough for your inferences to be reliable at all.

1 Like

Thanks Ben, that is really helpful. Usually when I post on these forums, I try to generalize my problem to learn more about how specific things are done (which I did in this case). I think now I am going to pose my actual problem and ask for help in figuring out how to better parameterize my model.

I am solving Bayesian subset selection. I have three items: A, B and C. Now, there are 8 possible selections that may be made from these three items:
-A, -B, -C none are selected,
-A, -B C, only C is selected
-A, B, -C
-A, B, C
A, -B, -C
A, -B, C
A, B, -C
A, B, C, all three are selected.

Now, I want to understand the selection process in terms of the marginal distributions of each item (or pairs):
p(A), p(B), p(C), p(AB), p(AC), p(BC), p(ABC).

So I can model the outcome above as a multinomial with probability \theta \in \mathbb{R}^8 where \mathrm{sum}(\theta)=1.

Now, each of the above selections can be rewritten in terms of the marginals:

theta[1] = (1-pC)-(pB-pBC)-(pA-pAB-(pAC-pABC)); // p(-A,-B,-C)
theta[2] = pC-pAC-(pBC-pABC);                   // p(-A,-B,C)
theta[3] = pB-pBC-(pAB-pABC);                   // p(-A,B,-C)
theta[4] = pBC-pABC;                            // p(-A,B,C)
theta[5] = pA-AB-(pAC-pABC);                    // p(A,-B,-C)
theta[6] = pAC-pABC;                            // p(A,-B,C)
theta[7] = pAB-pABC;                            // p(A,B,-C)
theta[8] = pABC;                                // p(A,B,C)


so note that algebraically, the elements of \theta sum to 1. Now, to constrain each element of \theta to be positive, I can constrain the marginals:

real<lower=0,upper=1> pA;
real<lower=0,upper=1> pB;
real<lower=0,upper=1> pC;
real<lower=0,upper=fmin(pA,pB)> pAB;
real<lower=0,upper=fmin(pA,pC)> pAC;
real<lower=0,upper=fmin(pB,pC)> pBC;
real<lower=0,upper=fmin(pAB,fmin(pAC,pBC))> pABC;


but these constraints

1. wont work with vector valued marginals
2. only satisfy theta[4], theta[6], theta[7], theta[8] being positive.

Eventually, I would like to do beta regression for the marginals, so pA ~ beta(muA*phiA,(1-muA)*phiA), for muA = inv_logit (betaA^T x) etc. An I would like to learn the betaA, betaB, ..., betaABC,
so I need constraints on the marginals to ensure Stan will initialize and run inference.

So looking back at the question you answered, I would like to know how to constrain
A v + b = theta where theta is a simplex, and v = [pA,pB,pC,pAB,pAC,pAB,pABC], and
A = [[-1,-1,-1,1,1,1,-1], [0,0,1,0,0,-1,-1,1], [0,1,0,-1,0,-1,1], [0,0,0,0,0,1,-1], [1,0,0,-1,-1,0,1], [0,0,0,0,1,0,-1], [0,0,0,1,0,0,-1], [0,0,0,0,0,0,1], ] and b = [1,0,0,0,0,0,0]

I would parameterize it with a combination of marginal and conditional probabilities, like

parameters {
// marginals
real<lower = 0, upper = 1> pA;
real<lower = 0, upper = 1> pB;
real<lower = 0, upper = 1> pC;
// conditionals, where _ means "given"
real<lower = 0, upper = 1> pB_A;
real<lower = 0, upper = 1> pC_A;
real<lower = 0, upper = 1> pC_B;
real<lower = 0, upper = 1> pC_AB;
}
transformed parameters {
simplex[8] theta;
real pAB = pA * pB_A;
real pAC = pA * pC_A;
real pBC = pB * pC_B;
real pABC = pAB * pC_AB;
theta[8] = pABC;
theta[7] = pAB - pABC;
theta[6] = pAC - pABC;
theta[5] = pA - pAB - theta[6];
theta[4] = pBC - pABC;
theta[3] = pB - pBC - theta[7];
theta[2] = pC - pAC - theta[4];
theta[1] = 1 - pC - (pB - pBC) - (pA - pAB - theta[6]);
}
}


I didnâ€™t check your math, but I think this parameterization is consistent with what you wrote. Remember that if you want to put priors on pAB, pAC, pBC, and / or pABC, then you need to handle your own Jacobian determinant.

2 Likes

Thanks Ben! I like this formalism better! I think putting priors on the conditionals will be â€śeasierâ€ť anyway (from domain knowledge).

And this solves my vectorization problem as well!

Boom!

Hey Ben,