How to properly use Simplex with Dirichlet likelihood?

I have some data, which looks like:

curr_A, curr_B, curr_C, curr_D, curr_E, days, end_1, end_2, end_3
1,      0,      0,      0,      0,      1,     0.3,  0.7,   0.0
1,      0,      0,      0,      0,      2,     0.2,  0.6,   0.2      

My input variables are curr_A...curr_E and days. While my output variables are curr_1...curr_3 (note that they sum to 1, so you can think of them as a given row’s affiliation w/ each of the three outputs.) The goal is to use a Dirichlet a distribution to model my outputs.

Currently, an error is being thrown regarding pred~dirichlet(Y[n]);: Real return type required for probability function. Could anyone explain where I’m going wrong?

Additionally, is a simplex required for what I’m trying to do?

Edit: Stan also takes issue with softmax usage. No idea why.

data {
  int K; //outcome classes, 3
  int N; //num rows
  int D; //input dimensions, 5
  row_vector[K] Y[N]; //array of row vectors, each with k elements
  matrix[N,D] X;
  int days[N]; 
}
parameters {
  matrix[D, K] C; //[5,3] 
  matrix[D, K] B; //[5,3]
  
}
model {
  for (n in 1:N){
    row_vector[K] pred; // [1,3]
    row_vector[D] ipt; // [1,5]
    ipt = X[n]; 
    
    //   [1,5]*[5,3] + [1,5]*[5,3] => [1,3]
    pred = ipt * C + (ipt * days[n]) * B;
    pred = softmax(pred);
    pred~dirichlet(Y[n]); //    
  }
}

The cause of the specific error seems to be that the dirichlet_lpdf has type real, so I’m guessing if you initialize real pred[K] this error would go away.

A simplex type variable should not be needed for a Dirichlet prior, but maybe it’s recommended as it would enforce the correct constraint on the parameter. However, there may be some bugs with the type.

1 Like

Hello Jacob.

Premise: I did not study your model well, I am just referring to the dirichlet part.

Few points,

  • you cannot transform any parameter in a non-linear way (e.g. softmax, log etc…) without Jacobian correction
  • When you tranform any variable with N degrees of freedom with softmax, you get undeterminability as softmax(1,1,1) == softmax(2,2,2). So you have to start with a variable with N-1 degrees of freedom

But independently from all this I don’t think you need a Dirichlet distribution for this.

1 Like

There are three possible issues here. First, the softmax function is only defined for vectors, not row vectors. So if you want to end up with a row vector, then you need to transpose the inside of the function call and then transpose the result:

pred[n] = softmax((X[n] * C + (X[n] * days[n]) * B)')';

The next is that you’re providing priors for a local, temporary, variable. Try moving the construction of pred to the transformed parameters block:

transformed parameters {
  row_vector[K] pred[N];

  for (n in 1:N){
    pred[n] = softmax((X[n] * C + (X[n] * days[n]) * B)')';
  }
}

Additionally, because you’re putting a prior on a transformation (i.e., softmax(pred) ~ dirichlet(Y)), you need to add a correction factor, which is the log absolute determinant of the jacobian. More details on the jacobian is here: https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/

A rough implementation of this is:

  for(n in 1:N) {
    matrix[K,K] jacobian;
    for(i in 1:K) {
      for(j in 1:K) {
        jacobian[i,j] = pred[n][i] * (ident[i,j] - pred[n][j]);
      }
    }
    target += log_determinant(jacobian);
  }

Putting this all together (into untested code, so no promises), you end up with:

data {
  int K; //outcome classes, 3
  int N; //num rows
  int D; //input dimensions, 5
  row_vector[K] Y[N]; //array of row vectors, each with k elements
  matrix[N,D] X;
  int days[N]; 
}
transformed data {
  matrix[K,K] ident = diag_matrix(rep_vector(1,K));
}
parameters {
  matrix[D, K] C; //[5,3] 
  matrix[D, K] B; //[5,3]
  
}
transformed parameters {
  row_vector[K] pred[N];

  for (n in 1:N){
    pred[n] = softmax((X[n] * C + (X[n] * days[n]) * B)')';
  }
}
model {
  for(n in 1:N) {
    pred[n] ~ dirichlet(Y[n]);

    matrix[K,K] jacobian;
    for(i in 1:K) {
      for(j in 1:K) {
        jacobian[i,j] = pred[n][i] * (ident[i,j] - pred[n][j]);
      }
    }
    target += log_determinant(jacobian);
  }
}
3 Likes

Nice! what do you mean by rough implementation?

Oh just that it might not be the most efficient implementation with the nested loops

1 Like

I see, rough was not about correctness (this would make the aswer to one post of mine Jacobian of softmax tansformation of a (n-1 degrees of freedom) unbounded parameter).

I woudl like to point out that the issue with indeterminability remains. Unless I am missing something.

Hi @stemangiola, what would you recommend as an alternative to Dirichlet distribution? I considered it, because unlike categorical, or multinomial, it should handle proportions better. If this premise is faulty, please advise.

My data is built in terms of proportions, so in your example of softmax(1,1,1)==softmax(2,2,2) , I don’t see this as a bad thing. Should I?

is X a matrix of simplexes? Which variable includes end_1, end_2, end_3?

So, just at looking at the formula I see

softmax(real_unconstrained * real_unconstrained + (real_unconstrained * int) * real_unconstrained )

For sure C and B are uncontrained and might suffer from indeterminability.

Could you make couple of concrete examples with (possible) numbers of ipt * C + (ipt * days[n]) * B leading to a simplex?