Hi,
I have a model that I’ve tried to re-parameterize to the log-scale, but based on the results I am clearly doing something wrong. I have no idea what the problem is and would appreciate any help.
First, here are the log-scaled Dirichlet, the log-scaled inverse ILR simplex constraint algorithm, and the log-scaled multinomial distribution. There’s also a function to calculate a log-scaled matrix-vector product.
functions block
functions {
vector inv_ilr_log_simplex_constrain_lp(vector y) {
int N = rows(y) + 1;
vector[N - 1] ns = linspaced_vector(N - 1, 1, N - 1);
vector[N - 1] w = y ./ sqrt(ns .* (ns + 1));
vector[N] z = append_row(reverse(cumulative_sum(reverse(w))), 0) - append_row(0, ns .* w);
real r = log_sum_exp(z);
vector[N] log_x = z - r;
target += 0.5 * log(N);
target += log_x[N]; // This is why we subtract log_theta[N] in log_dirichlet_lpdf
return log_x;
}
real log_dirichlet_lpdf(vector log_theta, vector alpha) {
int N = rows(log_theta);
if (N != rows(alpha)) {
reject("Input must contain same number of elements as alpha");
}
return dot_product(alpha, log_theta) - log_theta[N]
+ lgamma(sum(alpha)) - sum(lgamma(alpha));
}
real multinomial_log_lpmf(array[] int y, vector log_theta) {
int N = sum(y);
int K = num_elements(log_theta);
real lp = lgamma(N + 1);
for (k in 1:K) {
lp += log_theta[k] * y[k] - lgamma(y[k] + 1);
}
return lp;
}
vector log_matrix_vector_product(matrix log_A, vector log_x) {
int N = rows(log_A);
vector[N] result;
for (i in 1:N) {
vector[cols(log_A)] sum_vector = to_vector(log_A[i]) + log_x;
result[i] = log_sum_exp(sum_vector);
}
return result;
}
}
The details for all of these functions can be found here with some additional information here. The inverse ILR algorithm is from here.
The inverse ILR is there because the log-Dirichlet requires log_beta_mat
to be constrained such that sum(exp(log_beta_mat[i, r])) = 1
, which can’t be achieved by simply declaring it as an array of simplexes (see model below).
And here is the rest of the model.
model
data {
int<lower=1> n; // number of election units i
int<lower=2> R; // number of candidates in election 1
int<lower=2> C; // number of candidates in election 2
array[n, C] int<lower=0> N_c; // marginal count for column c in unit i
array[n] vector[R] log_p_r; // marginal log proportion for row r in unit i
}
parameters {
matrix<lower=0>[R, C] alpha_r;
array[n, R] vector<lower=0>[C - 1] gamma_r;
}
transformed parameters {
array[n] vector[C] log_theta_c;
array[n] matrix[R, C] log_beta_mat;
for (i in 1:n) {
for (r in 1:R) {
log_beta_mat[i, r] = inv_ilr_log_simplex_constrain_lp(gamma_r[i, r])';
}
log_theta_c[i] = log_matrix_vector_product(log_beta_mat[i]', log_p_r[i]);
}
}
model {
to_vector(alpha_r) ~ exponential(0.5);
for (i in 1:n) {
for (r in 1:R) {
to_vector(log_beta_mat[i, r]) ~ log_dirichlet(to_vector(alpha_r[r,]));
}
target += multinomial_log_lpmf(N_c[i] | log_theta_c[i]);
}
}
If you want to run the model, here is some toy data.
data
data <- list(
n = 10,
R = 2,
C = 2,
N_c = matrix(
c(392, 37,
13, 5,
196, 30,
1057, 1150,
3334, 3966,
989, 1045,
1852, 1706,
2379, 2180,
306, 246,
1687, 1634),
10, 2,
byrow = T
),
log_p_r = matrix(
c(-0.2150181, -1.6426163,
-0.4054651, -1.0986123,
-0.2110488, -1.6593349,
-0.8067478, -0.5911453,
-0.8468413, -0.5599583,
-0.7890475, -0.6056443,
-0.7135908, -0.6731131,
-0.7115207, -0.6751052,
-0.6645738, -0.7225611,
-0.7422084, -0.6463808),
10, 2,
byrow = T
)
)
As a sanity check, the estimand is exp(log_beta_mat)
, and this model is guaranteed to be pretty biased, but exp(log_beta_mat)
should at least approach the ground-truth by around +/- 0.2 or so. You can compare to these values as the ground-truth just to see that the current model doesn’t come close at all. It appears that in fact the support of the posterior is truncated to p = (0, 0.5) and 1 - p instead of (0,1), and the mode is basically always approaching 0.5. This is not how I would expect the model to behave if it weren’t on the log-scale. (I’m including a version of the model that’s not log-scaled as a comparison — you’ll see that the there are lots of divergences, but there isn’t any similar weird behavior of the posterior).
different parameterization for comparison
data {
int<lower=1> n; // number of election units i
int<lower=2> R; // number of candidates in election 1
int<lower=2> C; // number of candidates in election 2
array[n] vector[R] p_r;
array[n, C] int<lower=0> N_c; // marginal count for column c in unit i
}
parameters {
matrix<lower=0>[R, C] alpha_r;
array[n, R] simplex[C] beta_r; // proportion of row r in column c for unit i
}
transformed parameters {
array[n] simplex[C] theta_c; // estimated marginal proportion for column c in unit i
array[n] matrix[R, C] beta_mat; // matrix of vectors beta_r to be used for transformation
for (i in 1:n) {
for (r in 1:R) {
beta_mat[i, r] = to_row_vector(beta_r[i, r]);
}
theta_c[i] = beta_mat[i]' * p_r[i];
}
}
model {
to_vector(alpha_r) ~ exponential(0.5);
for (i in 1:n) {
for (r in 1:R) {
beta_r[i, r] ~ dirichlet(alpha_r[r,]);
}
N_c[i] ~ multinomial(theta_c[i]);
}
}
Again, I have no idea what’s wrong with this code, but if you do I would be grateful for the help!