How to add inequality constraint on sum of parameters

Hi everyone,

My advisor Dr. Adam Fleischhacker and I are working on elaborating the example models (following a full Bayesian workflow) in Stan reference manual while learning Stan. We have a question when working on ARCH model. The example in the manual is an ARCH(1) model, we would like to work on a more general ARCH(K) model:
image
with the condition of ![image|115x64](upload://oRtuEXsLGy9NMnnmN6HMobAzkau.jpg image for i>0.

We wonder how to add this inequality constraint on sum of alpha_i, can I do something like:

parameters{
vector<lower=0, upper=1>[K] alpha;
}
transformed parameters{
real<lower = 0, upper = 1> sum_alpha;
sum_alpha = sum(alpha);
}

Thank you!

In cases like these, it is best to downscale a simplex. So,

parameters {
  real<lower=0,upper=1> sum_alpha;
  simplex[K] alpha_unscaled; // sums to 1
  real<lower=0> alpha_0;
}
transformed parameters {
  vector[K] alpha = sum_alpha * alpha_unscaled; // sums to sum_alpha
}
1 Like

This is brilliant! Thank you very much!

Hi,
I did exactly this in one of my models.
However, Stan is giving me a warning that I may have to add a Jacobian adjustment because of the change of variables. To me it seems that all we are doing is linear, so the Jacobian adjustment is not necessary. Do I miss out on something?
Thanks.

If you are attempting to put a prior on alpha in the transformed parameter block from the example above, then it is non-linear. It is best to put the priors on the things in the parameters block unless you know how to do the Jacobian adjustment correctly.

Thanks for your answer. I am using alpha as the lefthand side of a sampling statement in the model block, I guess that is what you mean by putting a prior?
What do you mean by ā€œsetting a prior in the parameter blockā€?
And, could you point me towards a reference for the correct derivation of Jacobians of such transforms? I think I do have the necessary math background but donā€™t know enough about bayesian inference with NUTS and why it needs Jacobian corrections.

I mean putting a prior ā€” in the model block ā€” on the things declared in the parameters block, so on sum_alpha and alpha_unscaled. If you do go the Jacobian route, then watch


starting at about 7:30
1 Like

Thanks for this.
I have derived the adjustment for the above example and would be glad if somebody could tell me if what I did is correct. I put all the steps in detail. The only thing I am unsure about is if I can just let the redundant component of the simplex aside.

Define \alpha as the K-simplex and \beta as its scale (a scalar). \gamma is given by

\gamma = \beta \alpha.

Thus, the transform \gamma = f(\alpha,\beta) can be defined by leaving the redundant \alpha_K aside:

\gamma_k = \beta \alpha_k \; \mathrm{for} \; k \in (1,K-1)\\ \gamma_K = \beta (1-\sum_{i=1}^{K-1} \alpha_i)

The adjustment is the determinant of the Jacobian of the inverse transform:

\mathrm{det}(J_{f^{-1}}) = (\mathrm{det}(J_f))^{-1}

J_f can be divided in a diagonal part for which

A = J_f(i,i) = \beta \; \mathrm{for} \; i \in (1,K-1),

and

B = J_f(i,K) = \alpha_i \; \mathrm{for} \; i \in (1,K-1)
C = J_f(K,i) = -\beta \; \mathrm{for} \; i \in (1,K-1)
D = J_f(K,K) = 1-\sum_{i=1}^{K-1}\alpha_i.

Thus the determinant is

\mathrm{det}(J_f) = (D-CA^{-1}B) \; \mathrm{det}(A) = \beta^{K-1}

and

\mathrm{det}(J_{f^{-1}}) = \beta^{1-K}.

We have to apply the log of the absolute value of this determinant (\mathrm{log}(|\beta^{1-K}|)) as the Jacobian correction in our model block:

target += (1-K)*log(beta).

EDIT: the correction should be target += (K-1)*log(beta), see below for details!

Iā€™m lousy at linear algebra of this complexity, but I can say that it will depend on how the parameters are declared in Stan. Do you start with \alpha being a simplex? If so, its Jacobian is already included; otherwise, you need to include the simplex transform as you seem to be doing for \gamma_K.

Yes, \alpha is defined as a simplex. So in this case I donā€™t have to apply any Jacobian correction, even if I multiply the components of \alpha by another parameter \beta ?

I have to count Jacobians on my fingers.

Overall, what needs to happen is the K free unconstrained parameters map to the scaled simplex \beta \alpha, where \alpha is a simplex and \beta is positive. I just meant that the Jacobian for the K - 1 unconstrained parameters to simplex \alpha and 1 unconstrained variable to positive \beta are already included.

But now that I write it out, I see you took this into account in the mapping from the K - 1 outputs of the simplex and \beta to \beta \alpha.

Now if I lay out another transform as

(\alpha_1, \ldots, \alpha_{K - 1}, \beta) \mapsto (\beta \alpha_1, \ldots, \beta \alpha_K)

then we need to add the log Jacobian determinant of this map to the implicit log Jacobian applied to the simplex and positive transforms. The Jacobian is lower triangular [edit: not, see below!] with the first K - 1 elements of the diagonal being for k \in 1:(K - 1),

J_{k, k} = \frac{\partial}{\partial \alpha_k} \beta \alpha_k = \beta

and the K-th element being

J_{K, K} = \frac{\partial}{\partial \beta} \, \beta \cdot \alpha_K \ = \ \frac{\partial}{\partial \beta} \, \beta \cdot \left( 1 - \sum_{k=1}^{K-1} \alpha_k \right) \ = \ \alpha_K

[edit: this is right as far as it goes, but it misses the non-zero elements in the last row and column.]

The log Jacobian determinant of a lower-triangular matrix is just the sum of the logs of the elements [edit: correct], so it looks like it should be [edit: but is not]:

\log \mathrm{det}(J) = (K - 1) \log \beta + \log \alpha_K

So it looks to me like the log Jacobian should is missing a \log \alpha_K term in the final log Jacobian determinant [edit: itā€™s not].

But as I said, Iā€™m really bad at this. These problems are like homework to me and I keep hoping to get better at it. So please correct my homework if itā€™s wrong. [edit: Thanks!]

1 Like

Thatā€™s about where I am with those things, never used too much linalg since my undergradā€¦

Yes, this is the transform I was looking at. I think you are missing out some elements:

J_{K,k} = \frac{\partial}{\partial \beta} \beta \alpha_k = \alpha_k for k \in (1,K-1)
J_{k,K} = \frac{\partial}{\partial \alpha_k} \beta \alpha_K = \frac{\partial}{\partial \alpha_k} \beta (1-\sum_k \alpha_k) = - \beta for k \in (1,K-1)

Thus the matrix is not triangular but the determinant can be found by dividing it into blocks A to D (see above), where D is a 1x1 matrix:

Also, I thought that the Jacobian correction is the one of the inverse tranform, which in this case is \beta^\mathbf{1-K}:

Meaning that the correction is

Let me know if this doesnā€™t make sense, as I said I havenā€™t done much of this at all in the last 10 years or soā€¦

I wrote that part of the manual. And yes, itā€™s the inverse transform, which for our constrained parameters is the mapping from the unconstrained space to the constrained space. Or in this case, your change of variables,

(\alpha_1, \ldots, \alpha_{K - 1}, \beta) \mapsto (\beta \alpha_1, \ldots, \beta \alpha_K)

As you point out, the Jacobianā€™s not lower triangular. I shouldā€™ve drawn it all out rather than rushing, and you can see the structure:

J = \begin{array}{c|ccccc} & \alpha_1 & \alpha_2 & \cdots & \alpha_{K-1} & \beta \\ \hline \beta \cdot \alpha_1 & \beta & 0 & \cdots & 0 & \alpha_1 \\ \beta \cdot \alpha_2 & 0 & \beta & \cdots & 0 & \alpha_2 \\ \vdots & 0 & 0 & \ddots & 0 & \vdots \\ \beta \cdot \alpha_{K-1} & 0 & 0 & \cdots & \beta & \alpha_{K-1} \\ \beta \cdot \left(1 - \sum_{k=1}^{K-1} \alpha_k\right) & -\beta & -\beta & \cdots & -\beta & \left( 1 - \sum_{k=1}^{K-1} \right) \alpha_k \end{array}

Now Iā€™m stuck on the determinant (I was a pure math major who did abstract linear algebra up to Galois theoryā€”determinants were for engineers, or so my advsior said). If the determinantā€™s \beta^{K-1}, then yes, the appropriate Jacobian adjustment for the change of variables in Stan would be:

target += (K - 1) * log(beta);

Thanks! Our forums remain a great place to get my homework corrected :-)

1 Like

I had to look it up too. :-)

Thanks to you, and thanks for pointing out that I was taking the inverse of the inverse :)

Cheers.

Use Schurā€™s theorem

The top left corner of \mathbf{J} is a diagonal submatrix of size K - 1, so we know its determinant (\beta^{K - 1}) and inverse (diagonal with entries \frac{1}{\beta}). So, all we have to do is multiply by the determinant of the Schur complement, but that is just a 1\times1 matrix, which works out to 1 - \sum_{k=1}^{K-1}\alpha_k + \beta \mathbf{1}^\top \mathbf{I} \frac{1}{\beta} \left[\alpha_1, \alpha_2, \dots \alpha_{K-1}\right]^\top = 1.

1 Like

Thanks, Ben. I really need to learn matrices.

Sorry, @bgoodri to bring this up after a long time.

Do you mean that if we put a prior - in the model block - on things declared in the transformed parameters block - so on alpha - we wouldnā€™t require the Jacobian adjustment ?

Sorry, the conversation is a bit hard to follow for me so I wanted to seek a clarification.

Thanks!

If you put priors on things declared in the parameters block, then you never need a Jacobian adjustment. If you put priors on things not declared in the parameters block, then you need a Jacobian adjustment unless the Jacobian matrix is a constant with respect to the things declared in the parameters block.

Ah, I see. Would you know of a worked-out example that I could lookup? Appreciate it. Thank you.