How to properly use Simplex with Dirichlet likelihood?

Jacob_Moore · October 29, 2020, 5:33pm

I have some data, which looks like:

curr_A, curr_B, curr_C, curr_D, curr_E, days, end_1, end_2, end_3
1,      0,      0,      0,      0,      1,     0.3,  0.7,   0.0
1,      0,      0,      0,      0,      2,     0.2,  0.6,   0.2

My input variables are curr_A...curr_E and days. While my output variables are curr_1...curr_3 (note that they sum to 1, so you can think of them as a given row’s affiliation w/ each of the three outputs.) The goal is to use a Dirichlet a distribution to model my outputs.

Currently, an error is being thrown regarding pred~dirichlet(Y[n]);: Real return type required for probability function. Could anyone explain where I’m going wrong?

Additionally, is a simplex required for what I’m trying to do?

Edit: Stan also takes issue with softmax usage. No idea why.

data {
  int K; //outcome classes, 3
  int N; //num rows
  int D; //input dimensions, 5
  row_vector[K] Y[N]; //array of row vectors, each with k elements
  matrix[N,D] X;
  int days[N]; 
}
parameters {
  matrix[D, K] C; //[5,3] 
  matrix[D, K] B; //[5,3]
  
}
model {
  for (n in 1:N){
    row_vector[K] pred; // [1,3]
    row_vector[D] ipt; // [1,5]
    ipt = X[n]; 
    
    //   [1,5]*[5,3] + [1,5]*[5,3] => [1,3]
    pred = ipt * C + (ipt * days[n]) * B;
    pred = softmax(pred);
    pred~dirichlet(Y[n]); //    
  }
}

caesoma · November 2, 2020, 4:45am

The cause of the specific error seems to be that the dirichlet_lpdf has type real, so I’m guessing if you initialize real pred[K] this error would go away.

A simplex type variable should not be needed for a Dirichlet prior, but maybe it’s recommended as it would enforce the correct constraint on the parameter. However, there may be some bugs with the type.

stemangiola · November 2, 2020, 6:32am

Hello Jacob.

Premise: I did not study your model well, I am just referring to the dirichlet part.

Few points,

you cannot transform any parameter in a non-linear way (e.g. softmax, log etc…) without Jacobian correction
When you tranform any variable with N degrees of freedom with softmax, you get undeterminability as softmax(1,1,1) == softmax(2,2,2). So you have to start with a variable with N-1 degrees of freedom

But independently from all this I don’t think you need a Dirichlet distribution for this.

andrjohns · November 2, 2020, 7:06am

There are three possible issues here. First, the softmax function is only defined for vectors, not row vectors. So if you want to end up with a row vector, then you need to transpose the inside of the function call and then transpose the result:

pred[n] = softmax((X[n] * C + (X[n] * days[n]) * B)')';

The next is that you’re providing priors for a local, temporary, variable. Try moving the construction of pred to the transformed parameters block:

transformed parameters {
  row_vector[K] pred[N];

  for (n in 1:N){
    pred[n] = softmax((X[n] * C + (X[n] * days[n]) * B)')';
  }
}

Additionally, because you’re putting a prior on a transformation (i.e., softmax(pred) ~ dirichlet(Y)), you need to add a correction factor, which is the log absolute determinant of the jacobian. More details on the jacobian is here: https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/

A rough implementation of this is:

  for(n in 1:N) {
    matrix[K,K] jacobian;
    for(i in 1:K) {
      for(j in 1:K) {
        jacobian[i,j] = pred[n][i] * (ident[i,j] - pred[n][j]);
      }
    }
    target += log_determinant(jacobian);
  }

Putting this all together (into untested code, so no promises), you end up with:

data {
  int K; //outcome classes, 3
  int N; //num rows
  int D; //input dimensions, 5
  row_vector[K] Y[N]; //array of row vectors, each with k elements
  matrix[N,D] X;
  int days[N]; 
}
transformed data {
  matrix[K,K] ident = diag_matrix(rep_vector(1,K));
}
parameters {
  matrix[D, K] C; //[5,3] 
  matrix[D, K] B; //[5,3]
  
}
transformed parameters {
  row_vector[K] pred[N];

  for (n in 1:N){
    pred[n] = softmax((X[n] * C + (X[n] * days[n]) * B)')';
  }
}
model {
  for(n in 1:N) {
    pred[n] ~ dirichlet(Y[n]);

    matrix[K,K] jacobian;
    for(i in 1:K) {
      for(j in 1:K) {
        jacobian[i,j] = pred[n][i] * (ident[i,j] - pred[n][j]);
      }
    }
    target += log_determinant(jacobian);
  }
}

stemangiola · November 2, 2020, 7:16am

andrjohns:

A rough implementation of this is:

  for(n in 1:N) {
    matrix[K,K] jacobian;
    for(i in 1:K) {
      for(j in 1:K) {
        jacobian[i,j] = pred[n][i] * (ident[i,j] - pred[n][j]);
      }
    }
    target += log_determinant(jacobian);
  }

Nice! what do you mean by rough implementation?

andrjohns · November 2, 2020, 7:19am

Oh just that it might not be the most efficient implementation with the nested loops

stemangiola · November 2, 2020, 7:21am

I see, rough was not about correctness (this would make the aswer to one post of mine Jacobian of softmax tansformation of a (n-1 degrees of freedom) unbounded parameter).

stemangiola · November 2, 2020, 7:28am

I woudl like to point out that the issue with indeterminability remains. Unless I am missing something.

Jacob_Moore · November 2, 2020, 2:50pm

Hi @stemangiola, what would you recommend as an alternative to Dirichlet distribution? I considered it, because unlike categorical, or multinomial, it should handle proportions better. If this premise is faulty, please advise.

My data is built in terms of proportions, so in your example of softmax(1,1,1)==softmax(2,2,2) , I don’t see this as a bad thing. Should I?

stemangiola · November 2, 2020, 10:36pm

is X a matrix of simplexes? Which variable includes end_1, end_2, end_3?

So, just at looking at the formula I see

softmax(real_unconstrained * real_unconstrained + (real_unconstrained * int) * real_unconstrained )

For sure C and B are uncontrained and might suffer from indeterminability.

Could you make couple of concrete examples with (possible) numbers of ipt * C + (ipt * days[n]) * B leading to a simplex?

Topic		Replies	Views
Prior for Simplex, more informative than Dirichlet Modeling	9	530	February 26, 2024
Dirichlet_log: probabilities is not a valid simplex Modeling	26	3623	September 27, 2017
Initializing Simplex / Dirichlet Modeling fitting-issues , performance	3	2616	October 31, 2018
Simplex Regression Modeling	11	2619	December 14, 2017
Parameterising a latent simplex Modeling	11	228	April 13, 2025

How to properly use Simplex with Dirichlet likelihood?

Related topics