# Modeling Multivariate Binomial with Latent Variables

Hi,

Suppose Y_{ij} \sim Binomial(n, p_{ij}) where logit(p) = \theta \sim MVN(\mu, \Sigma). I am trying to estimate the values of \theta and \Sigma.
My estimates for \Sigma do not match the values I used to generate the data. Below is my data generating code

eta_cov <- diag(1, 5)
eta_cov[1, 4] <- eta_cov[4, 1] <- .3
eta_cov[2, 3] <- eta_cov[3, 2] <- .6
eta_cov[4, 3] <- eta_cov[3, 4] <- -.7

eta <- mvtnorm::rmvnorm(n = 300, mean = rep(0, 5), sigma = eta_cov)

Y <- matrix(0, nrow = 300, ncol = 5)

for(j in 1:5){
Y[, j] <- rbinom(n = 300, size = 20, prob = plogis(eta[, j]))
}


and my Stan code

data {
int<lower = 1> N; //number of rows
int<lower = 1> J; //number of columns
int<lower = 1> n_j;  //binomial n

array[N, J] int<lower = 0, upper = n_j> x;
}

transformed data{
vector[J] eta_sigma = rep_vector(1, J);
}

parameters {
matrix[N, J] Z_eta;
cholesky_factor_corr[J] L_Omega;
}

transformed parameters{
matrix[N, J] eta = Z_eta*diag_pre_multiply(eta_sigma, L_Omega)';
}

model {
to_vector(Z_eta) ~ std_normal();
L_Omega ~ lkj_corr_cholesky(1);

to_array_1d(x) ~ binomial_logit(n_j, to_vector(eta));
}

generated quantities{
corr_matrix[J] Omega = multiply_lower_tri_self_transpose(L_Omega);
}


Any help in resolving this issue would be greatly appreciated.

Thanks

1 Like

The problem likely is that arrays and matrices are stored in different internal order (column major vs. row major), so after doing to_array_1d and to_vector the corresponding indices are misaligned. When I update the model with an explicit loop to make sure the indices align, I get reasonable results.

model {
to_vector(Z_eta) ~ std_normal();
L_Omega ~ lkj_corr_cholesky(1);

for(j in 1:J) {
x[,j] ~ binomial_logit(n_j, eta[,j]);
}
}


Note that in this case the loop is not adding a lot of overhead (it does few iterations over large chunks), so I wouldn’t try too hard to eliminate it.

Hope that helps and best of luck with your model!

3 Likes

Thanks very much, this works.