I was trying to put this in a multivariate normal specification to see how we can derive the standard errors using matrix algebra.
We want a matrix M which when multiplied by a vector gives the vector plus the negative sum of the elements prior.
The vector \alpha needs to be length K, so I pad it an extra 0. An M that satisfies this is a matrix with ones on the diagonal and negative ones in the last column (the K x K value can be anything since alpha has 0 at the Kth value)
Let’s say that we want to put a standard normal prior on alpha. The transform on this standard normal is
Using the fact that the variance of a constant matrix A times a random matrix X is
Great! Now we can write some code. To set this up using @Bob_Carpenter’s data we just need the covariance matrix. Of course, we’ll use the Cholesky factor of it too.
id_mat <- diag(K - 1)
id_mat <- rbind(id_mat, rep(0, K - 1))
id_mat <- cbind(id_mat, -1)
id_mat[K, K] <- 1
Sigma <- cov2cor(id_mat %*% diag(K) %*% t(id_mat))
mod_mvn <- cmdstan_model('mvn_ctr.stan')
data = list(K = K,
N = N,
y = y,
L = t(chol(Sigma)),
s = rep(2, K)
)
Stan program
data {
int<lower=0> K;
int<lower=0> N;
array[K, N] int<lower=0, upper=1> y;
cholesky_factor_corr[K] L;
vector[K] s;
}
parameters {
real mu;
real<lower=0> sigma;
vector[K - 1] alpha_pre;
}
transformed parameters {
vector[K] alpha = mu + append_row(alpha_pre, -sum(alpha_pre));
}
model {
alpha ~ multi_normal_cholesky(rep_vector(mu, K), diag_pre_multiply(s .* rep_vector(sigma, K), L));
mu ~ normal(0, 1);
sigma ~ lognormal(0, 1);
for (k in 1:K)
y[k] ~ bernoulli_logit(alpha[k]);
}
generated quantities {
real mean_alpha = mean(alpha);
real<lower=0> sd_alpha = sd(alpha);
}
The s vector is there because the M matrix has a diagonal of 2 (except for the last value). When making this a correlation matrix the cov2cor divides by the sqrt of the diagonal. I’m adding it back in as the square of that (which is 2).
Which gives
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ -1952.19 -1951.91 3.29 3.21 -1957.81 -1947.40 1.00 876 1318
mu 0.63 0.63 0.06 0.06 0.53 0.74 1.00 1607 1609
sigma 2.13 2.08 0.41 0.38 1.56 2.88 1.00 2624 1560
alpha_pre[1] -2.25 -2.25 0.19 0.20 -2.54 -1.95 1.00 2871 1658
alpha_pre[2] 1.20 1.19 0.20 0.21 0.87 1.54 1.00 3618 1509
alpha_pre[3] 3.06 3.03 0.43 0.42 2.42 3.83 1.00 2778 1513
alpha_pre[4] -4.75 -4.70 0.53 0.49 -5.73 -3.94 1.00 2698 1318
alpha_pre[5] 1.12 1.12 0.21 0.21 0.78 1.48 1.00 3641 1586
alpha_pre[6] 2.15 2.14 0.29 0.29 1.70 2.66 1.00 3783 1527
alpha_pre[7] -0.67 -0.67 0.14 0.14 -0.90 -0.43 1.00 2674 1601
vs Bob’s
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ -1942.57 -1942.25 3.31 3.32 -1948.41 -1937.71 1.00 955 1529
mu 0.69 0.68 0.08 0.07 0.57 0.82 1.00 785 743
sigma 2.08 2.03 0.35 0.34 1.58 2.72 1.00 2049 1760
alpha_pre[1] -2.31 -2.30 0.19 0.19 -2.64 -1.99 1.00 1784 1323
alpha_pre[2] 1.13 1.12 0.21 0.20 0.77 1.48 1.00 2022 1255
alpha_pre[3] 2.92 2.89 0.42 0.41 2.29 3.67 1.00 2024 1533
alpha_pre[4] -4.70 -4.65 0.52 0.50 -5.62 -3.92 1.00 1801 1250
alpha_pre[5] 1.05 1.04 0.21 0.20 0.72 1.40 1.00 1991 1607
alpha_pre[6] 2.05 2.05 0.28 0.27 1.62 2.53 1.00 2119 1412
alpha_pre[7] -0.73 -0.72 0.16 0.15 -0.99 -0.47 1.00 1748 1612
Now, what’s really interesting is that you can clearly see how mu will have much tighter estimates based on the correlation matrix of M - I’m going to use a 4 x 4 but it’s the same last row and everything else as you get bigger -
[,1] [,2] [,3] [,4]
[1,] 1.0000000 0.5000000 0.5000000 -0.7071068
[2,] 0.5000000 1.0000000 0.5000000 -0.7071068
[3,] 0.5000000 0.5000000 1.0000000 -0.7071068
[4,] -0.7071068 -0.7071068 -0.7071068 1.0000000
Whereas the non-centered version assumes an identity correlation matrix.
What I don’t know is how this can be put into the univariate normal specification.