How to add inequality constraint on sum of parameters

Hi everyone,

My advisor Dr. Adam Fleischhacker and I are working on elaborating the example models (following a full Bayesian workflow) in Stan reference manual while learning Stan. We have a question when working on ARCH model. The example in the manual is an ARCH(1) model, we would like to work on a more general ARCH(K) model:
image
with the condition of ![image|115x64](upload://oRtuEXsLGy9NMnnmN6HMobAzkau.jpg image for i>0.

We wonder how to add this inequality constraint on sum of alpha_i, can I do something like:

parameters{
vector<lower=0, upper=1>[K] alpha;
}
transformed parameters{
real<lower = 0, upper = 1> sum_alpha;
sum_alpha = sum(alpha);
}

Thank you!

In cases like these, it is best to downscale a simplex. So,

parameters {
  real<lower=0,upper=1> sum_alpha;
  simplex[K] alpha_unscaled; // sums to 1
  real<lower=0> alpha_0;
}
transformed parameters {
  vector[K] alpha = sum_alpha * alpha_unscaled; // sums to sum_alpha
}
1 Like

This is brilliant! Thank you very much!

Hi,
I did exactly this in one of my models.
However, Stan is giving me a warning that I may have to add a Jacobian adjustment because of the change of variables. To me it seems that all we are doing is linear, so the Jacobian adjustment is not necessary. Do I miss out on something?
Thanks.

If you are attempting to put a prior on alpha in the transformed parameter block from the example above, then it is non-linear. It is best to put the priors on the things in the parameters block unless you know how to do the Jacobian adjustment correctly.

Thanks for your answer. I am using alpha as the lefthand side of a sampling statement in the model block, I guess that is what you mean by putting a prior?
What do you mean by “setting a prior in the parameter block”?
And, could you point me towards a reference for the correct derivation of Jacobians of such transforms? I think I do have the necessary math background but don’t know enough about bayesian inference with NUTS and why it needs Jacobian corrections.

I mean putting a prior — in the model block — on the things declared in the parameters block, so on sum_alpha and alpha_unscaled. If you do go the Jacobian route, then watch


starting at about 7:30
1 Like

Thanks for this.
I have derived the adjustment for the above example and would be glad if somebody could tell me if what I did is correct. I put all the steps in detail. The only thing I am unsure about is if I can just let the redundant component of the simplex aside.

Define \alpha as the K-simplex and \beta as its scale (a scalar). \gamma is given by

\gamma = \beta \alpha.

Thus, the transform \gamma = f(\alpha,\beta) can be defined by leaving the redundant \alpha_K aside:

\gamma_k = \beta \alpha_k \; \mathrm{for} \; k \in (1,K-1)\\ \gamma_K = \beta (1-\sum_{i=1}^{K-1} \alpha_i)

The adjustment is the determinant of the Jacobian of the inverse transform:

\mathrm{det}(J_{f^{-1}}) = (\mathrm{det}(J_f))^{-1}

J_f can be divided in a diagonal part for which

A = J_f(i,i) = \beta \; \mathrm{for} \; i \in (1,K-1),

and

B = J_f(i,K) = \alpha_i \; \mathrm{for} \; i \in (1,K-1)
C = J_f(K,i) = -\beta \; \mathrm{for} \; i \in (1,K-1)
D = J_f(K,K) = 1-\sum_{i=1}^{K-1}\alpha_i.

Thus the determinant is

\mathrm{det}(J_f) = (D-CA^{-1}B) \; \mathrm{det}(A) = \beta^{K-1}

and

\mathrm{det}(J_{f^{-1}}) = \beta^{1-K}.

We have to apply the log of the absolute value of this determinant (\mathrm{log}(|\beta^{1-K}|)) as the Jacobian correction in our model block:

target += (1-K)*log(beta).

EDIT: the correction should be target += (K-1)*log(beta), see below for details!

I’m lousy at linear algebra of this complexity, but I can say that it will depend on how the parameters are declared in Stan. Do you start with \alpha being a simplex? If so, its Jacobian is already included; otherwise, you need to include the simplex transform as you seem to be doing for \gamma_K.

Yes, \alpha is defined as a simplex. So in this case I don’t have to apply any Jacobian correction, even if I multiply the components of \alpha by another parameter \beta ?

I have to count Jacobians on my fingers.

Overall, what needs to happen is the K free unconstrained parameters map to the scaled simplex \beta \alpha, where \alpha is a simplex and \beta is positive. I just meant that the Jacobian for the K - 1 unconstrained parameters to simplex \alpha and 1 unconstrained variable to positive \beta are already included.

But now that I write it out, I see you took this into account in the mapping from the K - 1 outputs of the simplex and \beta to \beta \alpha.

Now if I lay out another transform as

(\alpha_1, \ldots, \alpha_{K - 1}, \beta) \mapsto (\beta \alpha_1, \ldots, \beta \alpha_K)

then we need to add the log Jacobian determinant of this map to the implicit log Jacobian applied to the simplex and positive transforms. The Jacobian is lower triangular [edit: not, see below!] with the first K - 1 elements of the diagonal being for k \in 1:(K - 1),

J_{k, k} = \frac{\partial}{\partial \alpha_k} \beta \alpha_k = \beta

and the K-th element being

J_{K, K} = \frac{\partial}{\partial \beta} \, \beta \cdot \alpha_K \ = \ \frac{\partial}{\partial \beta} \, \beta \cdot \left( 1 - \sum_{k=1}^{K-1} \alpha_k \right) \ = \ \alpha_K

[edit: this is right as far as it goes, but it misses the non-zero elements in the last row and column.]

The log Jacobian determinant of a lower-triangular matrix is just the sum of the logs of the elements [edit: correct], so it looks like it should be [edit: but is not]:

\log \mathrm{det}(J) = (K - 1) \log \beta + \log \alpha_K

So it looks to me like the log Jacobian should is missing a \log \alpha_K term in the final log Jacobian determinant [edit: it’s not].

But as I said, I’m really bad at this. These problems are like homework to me and I keep hoping to get better at it. So please correct my homework if it’s wrong. [edit: Thanks!]

1 Like

That’s about where I am with those things, never used too much linalg since my undergrad…

Yes, this is the transform I was looking at. I think you are missing out some elements:

J_{K,k} = \frac{\partial}{\partial \beta} \beta \alpha_k = \alpha_k for k \in (1,K-1)
J_{k,K} = \frac{\partial}{\partial \alpha_k} \beta \alpha_K = \frac{\partial}{\partial \alpha_k} \beta (1-\sum_k \alpha_k) = - \beta for k \in (1,K-1)

Thus the matrix is not triangular but the determinant can be found by dividing it into blocks A to D (see above), where D is a 1x1 matrix:

Also, I thought that the Jacobian correction is the one of the inverse tranform, which in this case is \beta^\mathbf{1-K}:

Meaning that the correction is

Let me know if this doesn’t make sense, as I said I haven’t done much of this at all in the last 10 years or so…

I wrote that part of the manual. And yes, it’s the inverse transform, which for our constrained parameters is the mapping from the unconstrained space to the constrained space. Or in this case, your change of variables,

(\alpha_1, \ldots, \alpha_{K - 1}, \beta) \mapsto (\beta \alpha_1, \ldots, \beta \alpha_K)

As you point out, the Jacobian’s not lower triangular. I should’ve drawn it all out rather than rushing, and you can see the structure:

J = \begin{array}{c|ccccc} & \alpha_1 & \alpha_2 & \cdots & \alpha_{K-1} & \beta \\ \hline \beta \cdot \alpha_1 & \beta & 0 & \cdots & 0 & \alpha_1 \\ \beta \cdot \alpha_2 & 0 & \beta & \cdots & 0 & \alpha_2 \\ \vdots & 0 & 0 & \ddots & 0 & \vdots \\ \beta \cdot \alpha_{K-1} & 0 & 0 & \cdots & \beta & \alpha_{K-1} \\ \beta \cdot \left(1 - \sum_{k=1}^{K-1} \alpha_k\right) & -\beta & -\beta & \cdots & -\beta & \left( 1 - \sum_{k=1}^{K-1} \right) \alpha_k \end{array}

Now I’m stuck on the determinant (I was a pure math major who did abstract linear algebra up to Galois theory—determinants were for engineers, or so my advsior said). If the determinant’s \beta^{K-1}, then yes, the appropriate Jacobian adjustment for the change of variables in Stan would be:

target += (K - 1) * log(beta);

Thanks! Our forums remain a great place to get my homework corrected :-)

1 Like

I had to look it up too. :-)

Thanks to you, and thanks for pointing out that I was taking the inverse of the inverse :)

Cheers.

Use Schur’s theorem


The top left corner of \mathbf{J} is a diagonal submatrix of size K - 1, so we know its determinant (\beta^{K - 1}) and inverse (diagonal with entries \frac{1}{\beta}). So, all we have to do is multiply by the determinant of the Schur complement, but that is just a 1\times1 matrix, which works out to 1 - \sum_{k=1}^{K-1}\alpha_k + \beta \mathbf{1}^\top \mathbf{I} \frac{1}{\beta} \left[\alpha_1, \alpha_2, \dots \alpha_{K-1}\right]^\top = 1.

1 Like

Thanks, Ben. I really need to learn matrices.

Sorry, @bgoodri to bring this up after a long time.

Do you mean that if we put a prior - in the model block - on things declared in the transformed parameters block - so on alpha - we wouldn’t require the Jacobian adjustment ?

Sorry, the conversation is a bit hard to follow for me so I wanted to seek a clarification.

Thanks!

If you put priors on things declared in the parameters block, then you never need a Jacobian adjustment. If you put priors on things not declared in the parameters block, then you need a Jacobian adjustment unless the Jacobian matrix is a constant with respect to the things declared in the parameters block.

Ah, I see. Would you know of a worked-out example that I could lookup? Appreciate it. Thank you.