Log-scaled dirichlet

alsosortofadam · August 29, 2025, 5:25am

Hi,

I have a model that I’ve tried to re-parameterize to the log-scale, but based on the results I am clearly doing something wrong. I have no idea what the problem is and would appreciate any help.

First, here are the log-scaled Dirichlet, the log-scaled inverse ILR simplex constraint algorithm, and the log-scaled multinomial distribution. There’s also a function to calculate a log-scaled matrix-vector product.

functions block

functions {
  vector inv_ilr_log_simplex_constrain_lp(vector y) {
    int N = rows(y) + 1;
    vector[N - 1] ns = linspaced_vector(N - 1, 1, N - 1);
    vector[N - 1] w = y ./ sqrt(ns .* (ns + 1));
    vector[N] z = append_row(reverse(cumulative_sum(reverse(w))), 0) - append_row(0, ns .* w);
    real r = log_sum_exp(z);
    vector[N] log_x = z - r;
    target += 0.5 * log(N);
    target += log_x[N]; // This is why we subtract log_theta[N] in log_dirichlet_lpdf
    return log_x;
  }
  real log_dirichlet_lpdf(vector log_theta, vector alpha) {
    int N = rows(log_theta);
    if (N != rows(alpha)) {
      reject("Input must contain same number of elements as alpha");
    }
    return dot_product(alpha, log_theta) - log_theta[N] 
          + lgamma(sum(alpha)) - sum(lgamma(alpha));
  }
  real multinomial_log_lpmf(array[] int y, vector log_theta) {
    int N = sum(y);
    int K = num_elements(log_theta);
    real lp = lgamma(N + 1);
    for (k in 1:K) {
      lp += log_theta[k] * y[k] - lgamma(y[k] + 1);
    }
    return lp;
  }
  vector log_matrix_vector_product(matrix log_A, vector log_x) {
    int N = rows(log_A);
    vector[N] result;
    for (i in 1:N) {
      vector[cols(log_A)] sum_vector = to_vector(log_A[i]) + log_x;
      result[i] = log_sum_exp(sum_vector);
    }
    return result;
  }
}

The details for all of these functions can be found here with some additional information here. The inverse ILR algorithm is from here.

The inverse ILR is there because the log-Dirichlet requires log_beta_mat to be constrained such that sum(exp(log_beta_mat[i, r])) = 1, which can’t be achieved by simply declaring it as an array of simplexes (see model below).

And here is the rest of the model.

model

data {
  int<lower=1> n;           // number of election units i
  int<lower=2> R;           // number of candidates in election 1
  int<lower=2> C;           // number of candidates in election 2
  array[n, C] int<lower=0> N_c; // marginal count for column c in unit i
  array[n] vector[R] log_p_r; // marginal log proportion for row r in unit i
}
parameters {
  matrix<lower=0>[R, C] alpha_r;            
  array[n, R] vector<lower=0>[C - 1] gamma_r;
}
transformed parameters {
  array[n] vector[C] log_theta_c;
  array[n] matrix[R, C] log_beta_mat;
  
  for (i in 1:n) {
    for (r in 1:R) {
      log_beta_mat[i, r] = inv_ilr_log_simplex_constrain_lp(gamma_r[i, r])';
    }
   log_theta_c[i] = log_matrix_vector_product(log_beta_mat[i]', log_p_r[i]);
  }
}
model {
  to_vector(alpha_r) ~ exponential(0.5);
  for (i in 1:n) {
    for (r in 1:R) {
      to_vector(log_beta_mat[i, r]) ~ log_dirichlet(to_vector(alpha_r[r,]));
    }
    target += multinomial_log_lpmf(N_c[i] | log_theta_c[i]);
  }
}

If you want to run the model, here is some toy data.

data

data <- list(
  n = 10,
  R = 2,
  C = 2,
  N_c = matrix(
    c(392,     37,
      13,      5,
      196,     30,
      1057,   1150,
      3334,   3966,
      989,    1045,
      1852,   1706,
      2379,   2180,
      306,    246,
      1687,   1634),
    10, 2,
    byrow = T
  ),
  log_p_r = matrix(
    c(-0.2150181, -1.6426163,
      -0.4054651, -1.0986123,
      -0.2110488, -1.6593349,
      -0.8067478, -0.5911453,
      -0.8468413, -0.5599583,
      -0.7890475, -0.6056443,
      -0.7135908, -0.6731131,
      -0.7115207, -0.6751052,
      -0.6645738, -0.7225611,
      -0.7422084, -0.6463808),
    10, 2,
    byrow = T
  )
             )

As a sanity check, the estimand is exp(log_beta_mat), and this model is guaranteed to be pretty biased, but exp(log_beta_mat) should at least approach the ground-truth by around +/- 0.2 or so. You can compare to these values as the ground-truth just to see that the current model doesn’t come close at all. It appears that in fact the support of the posterior is truncated to p = (0, 0.5) and 1 - p instead of (0,1), and the mode is basically always approaching 0.5. This is not how I would expect the model to behave if it weren’t on the log-scale. (I’m including a version of the model that’s not log-scaled as a comparison — you’ll see that the there are lots of divergences, but there isn’t any similar weird behavior of the posterior).

ground-truth estimand

different parameterization for comparison

data {
  int<lower=1> n;           // number of election units i
  int<lower=2> R;           // number of candidates in election 1
  int<lower=2> C;           // number of candidates in election 2
  array[n] vector[R] p_r;
  array[n, C] int<lower=0> N_c; // marginal count for column c in unit i
}
parameters {
  matrix<lower=0>[R, C] alpha_r;            
  array[n, R] simplex[C] beta_r; // proportion of row r in column c for unit i
}
transformed parameters {
  array[n] simplex[C] theta_c; // estimated marginal proportion for column c in unit i
  array[n] matrix[R, C] beta_mat; // matrix of vectors beta_r to be used for transformation
  for (i in 1:n) {
    for (r in 1:R) {
      beta_mat[i, r] = to_row_vector(beta_r[i, r]);
    }
    theta_c[i] = beta_mat[i]' * p_r[i];
  }
}
model {
  to_vector(alpha_r) ~ exponential(0.5);
  for (i in 1:n) {
    for (r in 1:R) {
      beta_r[i, r] ~ dirichlet(alpha_r[r,]);
    }
    N_c[i] ~ multinomial(theta_c[i]);
  }
}

Again, I have no idea what’s wrong with this code, but if you do I would be grateful for the help!

Topic		Replies	Views
Problem with multi-logit regression parameterization Modeling	3	749	February 7, 2018
A problem regarding "Log probability evaluates to log(0)" Modeling techniques	2	622	May 28, 2020
Regression of proportions Modeling	15	3109	November 7, 2017
Simplex Regression Modeling	11	2691	December 14, 2017
Jacobian adjustment of sigmoid (decentered) Modeling	5	1305	April 9, 2018

Log-scaled dirichlet

Related topics