 # How do I limit the sum of the elements in a vector to less than a constant

Hello Stan users and experts,

I have a vector that obeys a multinomial distribution as observation model, and I need part of this vector to go into the process model. Therefore, I want to create two vectors. One is the vector used in the process model ( log_A_3 in stan code), where each element in the vector is less than a constant (log_max in stan code) and the sum of the elements is less than log_max. The other is the vector used in the observation model ( log_A in stan code), where the sum of the elements is log_max (I know this can be done with simplex).
my script as follow

``````
data {
int I;                    //3
int T;                    //18
int J;                    //43
int<lower=0> y[T, J, I];
real log_max;             //log(1000)
//etc.....
}
parameters {
vector<upper = log_max>[I - 1] log_A[T, J];
//etc.....
}

transformed parameters {
vector<upper = log_max>[I] log_A_3[T, J];
simplex[I] theta[T, J];

for (t in 1:T) {
for (j in 1:J) {
log_A_3[t, j, 1] = log_A[t, j, 1];
log_A_3[t, j, 2] = log_A[t, j, 2];
log_A_3[t, j, 3] = log(exp(log_max) - exp(log_A_3[t, j, 1]) - exp(log_A_3[t, j, 2]));
}
}

for(t in 1:T) {
for (j in 1:J) {
theta[t, j] = softmax(log_A_3[t, j]); //thetaの式
}
}
//etc.....
}
model {
for (t in 1:T) {
for (j in 1:J) {
y[t, j, ] ~ multinomial(theta[t, j, ]);
}
}

for (t in 2:T) {  // For t > 1
for (j in 1:J) {
target += multi_normal_lpdf(log_A[t, j] | U[j] + B[j] * log_A[t - 1, j], diag_matrix(sigma));
}
}
for (j in 1:J) { // For t = 1
target += uniform_lpdf(log_A[1, j] | -1E6, log(exp(log_max) / 2));
//etc.....
}
``````

Now my problem is that the result has a lot of divergent transitions after warm up, I think the error is caused by a procedure that log_A_3[,3] is less than zero due to random sampling of log_A. Hence, I want to find a solution to make sure (log_A[,1]+log_A[,2])<log_max when log_A[,1]<log_max and log_A[,1]<log_max simultaneously.

Sorry for taking so long to respond. Just a few quick thoughts, correct me if this doesn’t address your problem:

You can verify this by using `print` statements (e.g. something like (`if(log_A_3[t,j,3] < 0) { print(" < 0 "); } `)

You could have a parameter `real<upper=log_max> log_A_sum` and then only `N-1` parameters for the element, computing the last so that the constraints are met. This however can imply a weird prior on the computed element. Check out the discussion on Test: Soft vs Hard sum-to-zero constrain + choosing the right prior for soft constrain for some related ideas.

Best of luck with your model!

I’m sorry to have put this analysis on hold for so long for some reason.

A have tried to create a log_sum but it will cause a new problem

" Log probability evaluates to log(0), i.e. negative infinity."

and I attempted many ways but couldn’t get rid of this error.
Do you have some good idea for it?

It looks like the log density is zero. I guess that could happen if a model has no sampling statements (things with a `~`) or `target +=` statements.

Your original model has plenty of those though. What’s your current model?

I did not change my model except “sum”. And if I change the value of “log_max” to 500 or log_A_3[,3] to log(25) or other smaller value, the error will be solved.

Aaaa, ignore my previous statement, that was totally wrong, my bad.

So somewhere here is an evaluation of `log(0)`.

These are the two lines that take a log of something:
`log_A_3[t, j, 3] = log(exp(log_max) - exp(log_A_3[t, j, 1]) - exp(log_A_3[t, j, 2]));`
`target += uniform_lpdf(log_A[1, j] | -1E6, log(exp(log_max) / 2));`

The second can just be written:
`target += uniform_lpdf(log_A[1, j] | -1E6, log_max - log(2));`

I think you could rewrite the first using something like a log_sum_exp trick (it’s slightly different cause it will have substraction), but: https://en.wikipedia.org/wiki/LogSumExp#log-sum-exp_trick_for_log-domain_calculations

Expressions like `log(exp(a) + exp(b) + ...)` usually have to be rewritten `log_sum_exp(a, b, …)``` or they end up breaking things.

Yes, I tried LSE function like this:

``````data {
int I;
int T;
int J;
real log_max;
}

parameters {
vector[I-1]log_A[T,J];
}

transformed parameters {
real <upper=log_max> summ[T,J];//the "sum" mentioned before
vector summ2[T,J];
vector<upper=log_max>[I] log_A_3[T,J];
for (t in 1:T){
for (j in 1:J){
summ[t,j]=log_sum_exp(log_A[t,j]);// LSE is here
summ2[t,j,1]=summ[t,j];
}
}

for (t in 1:T) {
for (j in 1:J) {
log_A_3[t,j,1] = log_A[t, j,1];
log_A_3[t,j,2] = log_A[t, j,2];
log_A_3[t,j,3] = log_sum_exp(log_max-summ2[t,j]);// and here
``````

But “Log probability evaluates to log(0), i.e. negative infinity.” still occur.
and I definitely believe that the error is caused by log_A_3[,3] because if I change it to any real, the error will be solved.

Mathematically, if say log_A_3 is the vector of real numbers A, B and C. I need following constraint:
A<log_max
B<log_max
C<log_max
exp(A)+exp(B)+exp(C )=exp(log_max) [because log_A_3 follows a multinomial distribution]

However, I had tried to write A,B,C as parameters respectively, it could work but the result showed that all of 1000 iterations are divergent transition after warmup. I do not know whether this way is feasible.

`log_sum_exp(log_max-summ2[t,j])` is equivalent to:

`log(exp(log_max-summ2[t,j]))` is equivalent to:

`log_max-summ2[t,j]` so I don’t think you want that.

There’d be a more numerically stable way to write `log(exp(a) - exp(b) - exp(c))`, but you’ll need to write that out yourself.

You can also check your math in R or something before you put it in Stan.

If you’re curious what values of things are causing problems, you can add print statements in Stan around where you think the code is blowing up.

Like:

``````print("log_max: ", log_max);
...
log_A_3[t, j, 3] = log(exp(log_max) - exp(log_A_3[t, j, 1]) - exp(log_A_3[t, j, 2]));
``````

Wouldn’t `log_diff_exp(a, log_sum_exp(b, c))` or `log_diff_exp(log_diff_exp(a,b), c)` work reasonably well?
I think `log_diff_exp` is feasible, it solved the problem that the elements of vector follow multinomial distribution. But it caused the `simplex[I]theta[T,J]` become NAN. And I tried to use `log_softmax(theta[t,j])` and `y[t, j,] ~ multinomial(exp(theta[t, j,]))` to solve the problem of NAN, and it worked. But the error “Log probability evaluates to log(0), i.e. negative infinity.” still occurred. And in the latest model, there is no `log()` anywhere. And I also tried to `print` all of parameters, there is no 0 value in log_A_3 or theta. What should I do next to solve this problem?