How do I limit the sum of the elements in a vector to less than a constant

Hello Stan users and experts,

I have a vector that obeys a multinomial distribution as observation model, and I need part of this vector to go into the process model. Therefore, I want to create two vectors. One is the vector used in the process model ( log_A_3 in stan code), where each element in the vector is less than a constant (log_max in stan code) and the sum of the elements is less than log_max. The other is the vector used in the observation model ( log_A in stan code), where the sum of the elements is log_max (I know this can be done with simplex).
my script as follow


data {
  int I;                    //3
  int T;                    //18
  int J;                    //43
  int<lower=0> y[T, J, I]; 
  real log_max;             //log(1000)
  //etc.....
}
parameters {
  vector<upper = log_max>[I - 1] log_A[T, J];    
 //etc.....
}

transformed parameters {
  vector<upper = log_max>[I] log_A_3[T, J];   
  simplex[I] theta[T, J];            

  for (t in 1:T) { 
    for (j in 1:J) {
      log_A_3[t, j, 1] = log_A[t, j, 1];  
      log_A_3[t, j, 2] = log_A[t, j, 2];
      log_A_3[t, j, 3] = log(exp(log_max) - exp(log_A_3[t, j, 1]) - exp(log_A_3[t, j, 2]));
    }
  }

 for(t in 1:T) {
    for (j in 1:J) {
      theta[t, j] = softmax(log_A_3[t, j]); //thetaの式
    }
  }
//etc.....
}
model {
   for (t in 1:T) {
    for (j in 1:J) {
      y[t, j, ] ~ multinomial(theta[t, j, ]);
    }
  }

  for (t in 2:T) {  // For t > 1
    for (j in 1:J) {
      target += multi_normal_lpdf(log_A[t, j] | U[j] + B[j] * log_A[t - 1, j], diag_matrix(sigma));
    }
  }
  for (j in 1:J) { // For t = 1
    target += uniform_lpdf(log_A[1, j] | -1E6, log(exp(log_max) / 2));
//etc.....
}

Now my problem is that the result has a lot of divergent transitions after warm up, I think the error is caused by a procedure that log_A_3[,3] is less than zero due to random sampling of log_A. Hence, I want to find a solution to make sure (log_A[,1]+log_A[,2])<log_max when log_A[,1]<log_max and log_A[,1]<log_max simultaneously.

Thank you in advance

Sorry for taking so long to respond. Just a few quick thoughts, correct me if this doesn’t address your problem:

You can verify this by using print statements (e.g. something like (if(log_A_3[t,j,3] < 0) { print(" < 0 "); } )

You could have a parameter real<upper=log_max> log_A_sum and then only N-1 parameters for the element, computing the last so that the constraints are met. This however can imply a weird prior on the computed element. Check out the discussion on Test: Soft vs Hard sum-to-zero constrain + choosing the right prior for soft constrain for some related ideas.

Best of luck with your model!

Thanks for your reply,
I’m sorry to have put this analysis on hold for so long for some reason.

A have tried to create a log_sum but it will cause a new problem

" Log probability evaluates to log(0), i.e. negative infinity."

and I attempted many ways but couldn’t get rid of this error.
Do you have some good idea for it?

Thank you in advance.

It looks like the log density is zero. I guess that could happen if a model has no sampling statements (things with a ~) or target += statements.

Your original model has plenty of those though. What’s your current model?

I did not change my model except “sum”. And if I change the value of “log_max” to 500 or log_A_3[,3] to log(25) or other smaller value, the error will be solved.

Aaaa, ignore my previous statement, that was totally wrong, my bad.

So somewhere here is an evaluation of log(0).

These are the two lines that take a log of something:
log_A_3[t, j, 3] = log(exp(log_max) - exp(log_A_3[t, j, 1]) - exp(log_A_3[t, j, 2]));
target += uniform_lpdf(log_A[1, j] | -1E6, log(exp(log_max) / 2));

The second can just be written:
target += uniform_lpdf(log_A[1, j] | -1E6, log_max - log(2));

I think you could rewrite the first using something like a log_sum_exp trick (it’s slightly different cause it will have substraction), but: LogSumExp - Wikipedia

Expressions like log(exp(a) + exp(b) + ...) usually have to be rewritten `log_sum_exp(a, b, …)``` or they end up breaking things.

Thanks for your advice.
Yes, I tried LSE function like this:

data {
  int I;                   
  int T;                   
  int J;                   
  real log_max;            
}

parameters {
  vector[I-1]log_A[T,J];  
}

transformed parameters {
  real <upper=log_max> summ[T,J];//the "sum" mentioned before
  vector [1]summ2[T,J];
  vector<upper=log_max>[I] log_A_3[T,J];  
 for (t in 1:T){    
   for (j in 1:J){
      summ[t,j]=log_sum_exp(log_A[t,j]);// LSE is here
      summ2[t,j,1]=summ[t,j];
    }
  }
  
  for (t in 1:T) { 
    for (j in 1:J) {
      log_A_3[t,j,1] = log_A[t, j,1];   
      log_A_3[t,j,2] = log_A[t, j,2];
      log_A_3[t,j,3] = log_sum_exp(log_max-summ2[t,j]);// and here

But “Log probability evaluates to log(0), i.e. negative infinity.” still occur.
and I definitely believe that the error is caused by log_A_3[,3] because if I change it to any real, the error will be solved.

Mathematically, if say log_A_3 is the vector of real numbers A, B and C. I need following constraint:
A<log_max
B<log_max
C<log_max
exp(A)+exp(B)+exp(C )=exp(log_max) [because log_A_3 follows a multinomial distribution]

However, I had tried to write A,B,C as parameters respectively, it could work but the result showed that all of 1000 iterations are divergent transition after warmup. I do not know whether this way is feasible.

Looking forward to your advice.

log_sum_exp(log_max-summ2[t,j]) is equivalent to:

log(exp(log_max-summ2[t,j])) is equivalent to:

log_max-summ2[t,j] so I don’t think you want that.

There’d be a more numerically stable way to write log(exp(a) - exp(b) - exp(c)), but you’ll need to write that out yourself.

You can also check your math in R or something before you put it in Stan.

If you’re curious what values of things are causing problems, you can add print statements in Stan around where you think the code is blowing up.

Like:

print("log_max: ", log_max);
...
log_A_3[t, j, 3] = log(exp(log_max) - exp(log_A_3[t, j, 1]) - exp(log_A_3[t, j, 2]));

A very minor addition:

Wouldn’t log_diff_exp(a, log_sum_exp(b, c)) or log_diff_exp(log_diff_exp(a,b), c) work reasonably well?

1 Like

Ooh, yeah, thanks @martinmodrak

thanks for your advice!
I think log_diff_exp is feasible, it solved the problem that the elements of vector follow multinomial distribution. But it caused the simplex[I]theta[T,J] become NAN. And I tried to use log_softmax(theta[t,j]) and y[t, j,] ~ multinomial(exp(theta[t, j,])) to solve the problem of NAN, and it worked. But the error “Log probability evaluates to log(0), i.e. negative infinity.” still occurred. And in the latest model, there is no log() anywhere. And I also tried to print all of parameters, there is no 0 value in log_A_3 or theta. What should I do next to solve this problem?

If theta is unconstrained, then I’d expect the code to be:

y[t, j,] ~ multinomial(softmax(theta[t, j,]));

You can post your model again, but these constraints and everything feel very guess and check.

It feels like the motivation to each change are come from the computation (does Stan run) rather than coming from a generative process in your model (how the parameters you’re trying to estimate generate the data you’re measuring).

Without some high level plan I suspect we’ll just address one error and move to the next, with no end in sight. My recommendation is to try to think of a generative process for your data.

That means expand on this:

And try to write down the model using statistics notation (and not code). It can seem more confusing and constraining than writing code, but try to work through the confusion and see if you figure out anything new about your problem from it.

Maybe this is a fun case study to look at: https://mc-stan.org/users/documentation/case-studies/golf.html

It’s a neat model development story.

1 Like

It’s very kind of you to have comments on my full model and generative process. I always felt that posting the full models is too tedious and would cause unnecessary trouble to the solver, so I only focused on my error previously.

My purpose is :
Using long-term biomonitoring time series data “Y” to estimate the parameters “U” and “B” in state space model.

where,
“Y” is a 18 years time series data of abundance of 3 species (Sp.1, Sp.2, Sp.3) with 43 repetitions (18×3 matrix ×43). And the data of these 3 species follow multinomial distribution (i.e. Y ~ Multinomial ( 40, theta )).

And the state space model is:
N_t ~ Multinormal ( U + B*N_t-1 , sigma)
where, N_t is a vector of log abundance of Sp.1 and Sp.2 (log(Y1) and log (Y2))in each year, U is a 2×1 vector and B is a 2×2 matrix. That is, the data of Sp.1 and Sp.2 is decided by their data in last year.

I think the most troublesome problem is that 3 species follow a multinomial distribution, and only two of them are used in the state space model. It does not look like a complicated model, but I really do not know what cause the log(0).

My current stan model as follow

data {
  int I;                   //3
  int T;                  //18
  int J;                  //43
  int<lower=0> y[T, J, I];
  real log_max;            
}

parameters {
  vector[I-1]log_A[T,J];  
  vector<lower = 0>[I - 1] sigma;      
  vector[I - 1] U[J];                   
  vector[I - 1] inter[J];              
  vector[I - 1] intra[J];
  real mu_U;
  real mu_inter;                   
  real mu_intra;
  real<lower = 0> sigma_U;
  real<lower = 0> sigma_inter;        
  real<lower = 0> sigma_intra;       
}

transformed parameters {
  real <upper=log_max> summ[T,J];
  vector<upper=log_max>[I] log_A_3[T,J];  
  vector [I] theta[T,J];        
  matrix[2, 2] B[J];        
  for (j in 1:J) {
    B[j, 1, 1] = intra[j, 1];
    B[j, 2, 2] = intra[j, 2];
    B[j, 1, 2] = inter[j, 1];
    B[j, 2, 1] = inter[j, 2];
  }
  
  for (t in 1:T){
    for (j in 1:J){
      summ[t,j]=log_sum_exp(log_A[t,j]);
    }
  }
  
  for (t in 1:T) { 
    for (j in 1:J) {
      log_A_3[t,j,1] = log_A[t, j,1];   
      log_A_3[t,j,2] = log_A[t, j,2];
      log_A_3[t,j,3] = log_diff_exp(log_max, summ[t,j]);
    }
  }
  
  for (t in 1:T){
    for (j in 1:J){
      theta[t,j]=log_softmax(log_A_3[t,j]);
    }
  }
}

model {
  for (t in 1:T) {
    for (j in 1:J) {
      y[t, j,] ~ multinomial(exp(theta[t, j,]));
    }
  }
  
  for (t in 2:T) {  // For t > 1
    for (j in 1:J) {
      target += multi_normal_lpdf(log_A[t, j] | U[j] + B[j] * log_A[t - 1, j], diag_matrix(sigma));
    }
  }
  
  // For t = 1
  for (j in 1:J) {
    target += uniform_lpdf(log_A[1, j] | -1E6, log_max -log(2));
  }
  
  //prior
  for (j in 1:J) {
    for (i in 1:(I - 1)) {
      inter[j, i] ~ normal(mu_inter, sigma_inter);
      intra[j, i] ~ normal(mu_intra, sigma_intra);
      U[j, i]     ~ normal(mu_U, sigma_U);
    }
  }
  
  mu_inter ~ normal(0, 10);
  mu_intra ~ normal(0, 10);
  sigma_inter ~ cauchy(0, 5);
  sigma_intra ~ cauchy(0, 5);
  sigma[1] ~ cauchy(0, 5);
  sigma[2] ~ cauchy(0, 5);
}

In my image, if I have enough long time series data and correct model, parameterization is not difficult. But it does not look that simple.
Looking forward for your advice.
Thank you in advance.

1 Like

So you’ve got 3 dimension multinomial output, and this is driven by a 2 dimensional underlying process.

How is it that the underlying process affects the distribution of the 3 dimensional multinomial?

Would you think that both components of the underying process effect all three of the multinomial outputs? Or some other arrangement?

1 Like

Actually, Y is a time series data of relative coverage of sessile assembly, the third dimension in Y is bare area, so it is not organism. It can be thought that in a total of 40 cm2 quadrant, interacting species sp.1 (Y1) and sp.2(Y2) occupy different areas, and the bare area (Y3) is the total area minus occupied area. (i.e. 40-Y1-Y2=Y3)
The process model is a dynamics model of pairwise interacting species, it means that the abundance of species at time t is determined by their abundance at time t-1, U and B is needed parameters.

Yes I think they effect all three of the multinomial outputs through 40-Y1-Y2.

Oh so I see where that constraint in the original question comes from now.

Since this is a three component thing, if you just had one time point to constrain I’d say do:

parameter {
  real<lower = 0.0, upper = 40.0> a;
  real<lower = 0.0, upper = 40.0 - a> b;
}

transformed parameters {
  real c = 40 - a - b;
}

But this isn’t a constraint on log_A (with log_A[t] == N_t)? If this is a constraint on Y and not log_A, then we’ll handle this with the multinomial output.

I don’t think there’s any single obvious way to link your time series model to your output. This is similar to how there isn’t an single obvious way to do the link function for a logistic regression.

I think the standard link function for these things is softmax. That takes an unconstrained length-N vector and normalizes it to a simplex.

Because the simplex has N - 1 degrees of freedom, there to make sure things stay identifiable, you don’t want to pass something to the softmax that has N degrees of freedom or you’ll end up with something non-identifiable.

So I think it’s fine you have something with 2 degrees of freedom and want to make a simplex with 3 elements.

As an example, you might do something like:

y[t] ~ multinomial(softmax([log_A[1, t], log_A[2, t], 0.0]));

Or if the connection isn’t so clear then maybe:

y[t] ~ multinomial(softmax([a1 + b1 * log_A[1, t],
                            a2 + b2 * log_A[2, t],
                            0.0]));

I guess the details of which is better depend on your application. There’s probably also a world where you do:

y[t] ~ multinomial(softmax([a1 + b1 * log_A[1, t] + c1 * log_A[2, t],
                            a2 + b2 * log_A[2, t] + c2 * log_A[2, t],
                            0.0]));

Leaving the last element of the simplex zero is just going to effect the interpretation of your coefficients. The question here is how your underlying time series affects the probability distribution of your multinomial, which causes a little awkwardness because your time series is unconstrained and your multinomial probabilities are a simplex.

I have tried this writing before.
It could be run, but the result showed 1000 divergent transitions after warm up (in the case of 2000 iteration). So I try to model it in a different way.

And today I found the problem area, is the prior of t=1.
target += uniform_lpdf(log_A[1, j] | -1E6, log_max -log(2))
If I remove the log(2), the script can be run. And actually, log(2) do not have exact meaning.

However, the result also showed 1000 divergent transitions after warm up (in the case of 2000 iteration and 1 chain), as well as 1 chains where the estimated Bayesian Fraction of missing information was low, largest R-hat is 2.11, and ESS is too low.

Both two ways showed 1000 divergent transitions, I wonder if both models are right, whether the problem lies elsewhere like my data (a lot 0 value) or prior distribution of other parameters (like mu or sigma or inter/intra).

I am looking into the cause and solution of these problems.
The above like a progress report (It felt like a sudden change of subject?), and if possible I hope to hear your advice.

Sincerely~

Which did you try before? The constraints on log_A or the different link function?

I have tried the constraints on Y and log_A, respectively. Both method cause a divergent transitions after warmup. And I have been using softmax function.