New Stan data type: zero_sum_vector

I was trying to put this in a multivariate normal specification to see how we can derive the standard errors using matrix algebra.

We want a matrix M which when multiplied by a vector gives the vector plus the negative sum of the elements prior.

\alpha M = \bigg[\alpha_1 \:\alpha_2 \: \cdots \: \alpha_{K -1} \; -\sum_\limits{i=1}^{K - 1} \alpha_i \bigg]

The vector \alpha needs to be length K, so I pad it an extra 0. An M that satisfies this is a matrix with ones on the diagonal and negative ones in the last column (the K x K value can be anything since alpha has 0 at the Kth value)

M = \begin{bmatrix} 1 & 0 & 0 & \dots & 0 & -1\\ 0& 1 & 0 &\dots & 0 & - 1 \\ 0& 0 & 1 & \dots & 0 & - 1 \\ \vdots & & \ddots & & \vdots & \vdots\\ 0 & & \dots & & 0 & 1\\ \end{bmatrix}

Let’s say that we want to put a standard normal prior on alpha. The transform on this standard normal is

\alpha M \sim MVN(0, MIM^T).

Using the fact that the variance of a constant matrix A times a random matrix X is

\begin{align} \mathbb{V}[𝐴𝑋] &= \mathbb{E}[(𝐴(𝑋−\mu))(𝐴(𝑋−\mu))^𝑇] \\ &=\mathbb{E}(𝐴(𝑋−\mu)(𝑋−\mu)^T𝐴^T) \\ &=𝐴\mathbb{E}[(𝑋−\mu)(𝑋−\mu)^𝑇]𝐴^𝑇 \\ &=𝐴\mathbb{V}[𝑋]𝐴^𝑇. \end{align}

Great! Now we can write some code. To set this up using @Bob_Carpenter’s data we just need the covariance matrix. Of course, we’ll use the Cholesky factor of it too.

id_mat <- diag(K - 1)
id_mat <- rbind(id_mat, rep(0, K - 1))
id_mat <- cbind(id_mat, -1)
id_mat[K, K] <- 1

Sigma <- cov2cor(id_mat %*% diag(K) %*% t(id_mat))

mod_mvn <- cmdstan_model('mvn_ctr.stan')
data = list(K = K, 
            N = N, 
            y = y,
            L = t(chol(Sigma)),
            s = rep(2, K)
)

Stan program

data {
  int<lower=0> K;
  int<lower=0> N;
  array[K, N] int<lower=0, upper=1> y;
  cholesky_factor_corr[K] L;
  vector[K] s;
}
parameters {
  real mu;
  real<lower=0> sigma;
  vector[K - 1] alpha_pre;
}
transformed parameters {
  vector[K] alpha = mu + append_row(alpha_pre, -sum(alpha_pre));
}
model {
  alpha ~ multi_normal_cholesky(rep_vector(mu, K), diag_pre_multiply(s .* rep_vector(sigma, K), L));
  mu ~ normal(0, 1);
  sigma ~ lognormal(0, 1);
  for (k in 1:K)
    y[k] ~ bernoulli_logit(alpha[k]);
}
generated quantities {
  real mean_alpha = mean(alpha);
  real<lower=0> sd_alpha = sd(alpha);
}

The s vector is there because the M matrix has a diagonal of 2 (except for the last value). When making this a correlation matrix the cov2cor divides by the sqrt of the diagonal. I’m adding it back in as the square of that (which is 2).

Which gives

     variable     mean   median   sd  mad       q5      q95 rhat ess_bulk ess_tail
 lp__         -1952.19 -1951.91 3.29 3.21 -1957.81 -1947.40 1.00      876     1318
 mu               0.63     0.63 0.06 0.06     0.53     0.74 1.00     1607     1609
 sigma            2.13     2.08 0.41 0.38     1.56     2.88 1.00     2624     1560
 alpha_pre[1]    -2.25    -2.25 0.19 0.20    -2.54    -1.95 1.00     2871     1658
 alpha_pre[2]     1.20     1.19 0.20 0.21     0.87     1.54 1.00     3618     1509
 alpha_pre[3]     3.06     3.03 0.43 0.42     2.42     3.83 1.00     2778     1513
 alpha_pre[4]    -4.75    -4.70 0.53 0.49    -5.73    -3.94 1.00     2698     1318
 alpha_pre[5]     1.12     1.12 0.21 0.21     0.78     1.48 1.00     3641     1586
 alpha_pre[6]     2.15     2.14 0.29 0.29     1.70     2.66 1.00     3783     1527
 alpha_pre[7]    -0.67    -0.67 0.14 0.14    -0.90    -0.43 1.00     2674     1601

vs Bob’s

     variable     mean   median   sd  mad       q5      q95 rhat ess_bulk ess_tail
 lp__         -1942.57 -1942.25 3.31 3.32 -1948.41 -1937.71 1.00      955     1529
 mu               0.69     0.68 0.08 0.07     0.57     0.82 1.00      785      743
 sigma            2.08     2.03 0.35 0.34     1.58     2.72 1.00     2049     1760
 alpha_pre[1]    -2.31    -2.30 0.19 0.19    -2.64    -1.99 1.00     1784     1323
 alpha_pre[2]     1.13     1.12 0.21 0.20     0.77     1.48 1.00     2022     1255
 alpha_pre[3]     2.92     2.89 0.42 0.41     2.29     3.67 1.00     2024     1533
 alpha_pre[4]    -4.70    -4.65 0.52 0.50    -5.62    -3.92 1.00     1801     1250
 alpha_pre[5]     1.05     1.04 0.21 0.20     0.72     1.40 1.00     1991     1607
 alpha_pre[6]     2.05     2.05 0.28 0.27     1.62     2.53 1.00     2119     1412
 alpha_pre[7]    -0.73    -0.72 0.16 0.15    -0.99    -0.47 1.00     1748     1612

Now, what’s really interesting is that you can clearly see how mu will have much tighter estimates based on the correlation matrix of M - I’m going to use a 4 x 4 but it’s the same last row and everything else as you get bigger -

        [,1]       [,2]       [,3]       [,4]
[1,]  1.0000000  0.5000000  0.5000000 -0.7071068
[2,]  0.5000000  1.0000000  0.5000000 -0.7071068
[3,]  0.5000000  0.5000000  1.0000000 -0.7071068
[4,] -0.7071068 -0.7071068 -0.7071068  1.0000000

Whereas the non-centered version assumes an identity correlation matrix.

What I don’t know is how this can be put into the univariate normal specification.

3 Likes