How do I limit the sum of the elements in a vector to less than a constant

Hello Stan users and experts,

I have a vector that obeys a multinomial distribution as observation model, and I need part of this vector to go into the process model. Therefore, I want to create two vectors. One is the vector used in the process model ( log_A_3 in stan code), where each element in the vector is less than a constant (log_max in stan code) and the sum of the elements is less than log_max. The other is the vector used in the observation model ( log_A in stan code), where the sum of the elements is log_max (I know this can be done with simplex).
my script as follow


data {
  int I;                    //3
  int T;                    //18
  int J;                    //43
  int<lower=0> y[T, J, I]; 
  real log_max;             //log(1000)
  //etc.....
}
parameters {
  vector<upper = log_max>[I - 1] log_A[T, J];    
 //etc.....
}

transformed parameters {
  vector<upper = log_max>[I] log_A_3[T, J];   
  simplex[I] theta[T, J];            

  for (t in 1:T) { 
    for (j in 1:J) {
      log_A_3[t, j, 1] = log_A[t, j, 1];  
      log_A_3[t, j, 2] = log_A[t, j, 2];
      log_A_3[t, j, 3] = log(exp(log_max) - exp(log_A_3[t, j, 1]) - exp(log_A_3[t, j, 2]));
    }
  }

 for(t in 1:T) {
    for (j in 1:J) {
      theta[t, j] = softmax(log_A_3[t, j]); //thetaの式
    }
  }
//etc.....
}
model {
   for (t in 1:T) {
    for (j in 1:J) {
      y[t, j, ] ~ multinomial(theta[t, j, ]);
    }
  }

  for (t in 2:T) {  // For t > 1
    for (j in 1:J) {
      target += multi_normal_lpdf(log_A[t, j] | U[j] + B[j] * log_A[t - 1, j], diag_matrix(sigma));
    }
  }
  for (j in 1:J) { // For t = 1
    target += uniform_lpdf(log_A[1, j] | -1E6, log(exp(log_max) / 2));
//etc.....
}

Now my problem is that the result has a lot of divergent transitions after warm up, I think the error is caused by a procedure that log_A_3[,3] is less than zero due to random sampling of log_A. Hence, I want to find a solution to make sure (log_A[,1]+log_A[,2])<log_max when log_A[,1]<log_max and log_A[,1]<log_max simultaneously.

Thank you in advance

Sorry for taking so long to respond. Just a few quick thoughts, correct me if this doesn’t address your problem:

You can verify this by using print statements (e.g. something like (if(log_A_3[t,j,3] < 0) { print(" < 0 "); } )

You could have a parameter real<upper=log_max> log_A_sum and then only N-1 parameters for the element, computing the last so that the constraints are met. This however can imply a weird prior on the computed element. Check out the discussion on Test: Soft vs Hard sum-to-zero constrain + choosing the right prior for soft constrain for some related ideas.

Best of luck with your model!

Thanks for your reply,
I’m sorry to have put this analysis on hold for so long for some reason.

A have tried to create a log_sum but it will cause a new problem

" Log probability evaluates to log(0), i.e. negative infinity."

and I attempted many ways but couldn’t get rid of this error.
Do you have some good idea for it?

Thank you in advance.

It looks like the log density is zero. I guess that could happen if a model has no sampling statements (things with a ~) or target += statements.

Your original model has plenty of those though. What’s your current model?

I did not change my model except “sum”. And if I change the value of “log_max” to 500 or log_A_3[,3] to log(25) or other smaller value, the error will be solved.

Aaaa, ignore my previous statement, that was totally wrong, my bad.

So somewhere here is an evaluation of log(0).

These are the two lines that take a log of something:
log_A_3[t, j, 3] = log(exp(log_max) - exp(log_A_3[t, j, 1]) - exp(log_A_3[t, j, 2]));
target += uniform_lpdf(log_A[1, j] | -1E6, log(exp(log_max) / 2));

The second can just be written:
target += uniform_lpdf(log_A[1, j] | -1E6, log_max - log(2));

I think you could rewrite the first using something like a log_sum_exp trick (it’s slightly different cause it will have substraction), but: https://en.wikipedia.org/wiki/LogSumExp#log-sum-exp_trick_for_log-domain_calculations

Expressions like log(exp(a) + exp(b) + ...) usually have to be rewritten `log_sum_exp(a, b, …)``` or they end up breaking things.

Thanks for your advice.
Yes, I tried LSE function like this:

data {
  int I;                   
  int T;                   
  int J;                   
  real log_max;            
}

parameters {
  vector[I-1]log_A[T,J];  
}

transformed parameters {
  real <upper=log_max> summ[T,J];//the "sum" mentioned before
  vector [1]summ2[T,J];
  vector<upper=log_max>[I] log_A_3[T,J];  
 for (t in 1:T){    
   for (j in 1:J){
      summ[t,j]=log_sum_exp(log_A[t,j]);// LSE is here
      summ2[t,j,1]=summ[t,j];
    }
  }
  
  for (t in 1:T) { 
    for (j in 1:J) {
      log_A_3[t,j,1] = log_A[t, j,1];   
      log_A_3[t,j,2] = log_A[t, j,2];
      log_A_3[t,j,3] = log_sum_exp(log_max-summ2[t,j]);// and here

But “Log probability evaluates to log(0), i.e. negative infinity.” still occur.
and I definitely believe that the error is caused by log_A_3[,3] because if I change it to any real, the error will be solved.

Mathematically, if say log_A_3 is the vector of real numbers A, B and C. I need following constraint:
A<log_max
B<log_max
C<log_max
exp(A)+exp(B)+exp(C )=exp(log_max) [because log_A_3 follows a multinomial distribution]

However, I had tried to write A,B,C as parameters respectively, it could work but the result showed that all of 1000 iterations are divergent transition after warmup. I do not know whether this way is feasible.

Looking forward to your advice.

log_sum_exp(log_max-summ2[t,j]) is equivalent to:

log(exp(log_max-summ2[t,j])) is equivalent to:

log_max-summ2[t,j] so I don’t think you want that.

There’d be a more numerically stable way to write log(exp(a) - exp(b) - exp(c)), but you’ll need to write that out yourself.

You can also check your math in R or something before you put it in Stan.

If you’re curious what values of things are causing problems, you can add print statements in Stan around where you think the code is blowing up.

Like:

print("log_max: ", log_max);
...
log_A_3[t, j, 3] = log(exp(log_max) - exp(log_A_3[t, j, 1]) - exp(log_A_3[t, j, 2]));

A very minor addition:

Wouldn’t log_diff_exp(a, log_sum_exp(b, c)) or log_diff_exp(log_diff_exp(a,b), c) work reasonably well?

1 Like

Ooh, yeah, thanks @martinmodrak

thanks for your advice!
I think log_diff_exp is feasible, it solved the problem that the elements of vector follow multinomial distribution. But it caused the simplex[I]theta[T,J] become NAN. And I tried to use log_softmax(theta[t,j]) and y[t, j,] ~ multinomial(exp(theta[t, j,])) to solve the problem of NAN, and it worked. But the error “Log probability evaluates to log(0), i.e. negative infinity.” still occurred. And in the latest model, there is no log() anywhere. And I also tried to print all of parameters, there is no 0 value in log_A_3 or theta. What should I do next to solve this problem?