Thanks for the response @bgoodri
As well as the set of predicted binary observations, from these I am then constructing 2x2 tables for the response patterns. My attempt at doing so is given below. It runs but some of the predicted counts (e[s,1],…, e[s,4]) are further off from the observed counts than they should be, so i think there is something wrong with how i’m obtaining the posterior predictions
My likelihood is a mixture and I am using bivariate normals written as two conditionally independent normal_lpdf’s rather than multi_normal_cholesky_lpdf (commented out) as it seems to be substantially faster (both seem to give the same results):
for (s in 1:NS) { // study index =s
real lp_bern1 = bernoulli_lpmf(0 | p[s]);
real lp_bern2 = bernoulli_lpmf(1 | p[s]);
for (n in 1:ns[s]) { // individual index =n
real lp1 = lp_bern1 //non-diseased
+ normal_lpdf(z[s,n,1] | a1_m[1] , 1)
+ normal_lpdf(z[s,n,2] | a2_m[1] + rho0[s]*(z[s,n,1] - a1_m[1]) , 1);
// + multi_normal_cholesky_lpdf(z[s,n,1:2] | nd_m[1:2] , L_Omega0[s,1:2]);
real lp2 = lp_bern2 //diseased
+ normal_lpdf(z[s,n,1] | a1_m[2] , 1)
+ normal_lpdf(z[s,n,2] | a2_m[2] + rho1[s]*(z[s,n,1] - a1_m[2]) , 1);
// + multi_normal_cholesky_lpdf(z[s,n,1:2] | d_m[1:2] , L_Omega1[s,1:2]);
target += log_sum_exp(lp1, lp2);
}
}
In the generated quantities block I have:
// generate model-predicted z's
// ( note: d_m[1] = a1_m[2], d_m[2] = a2_m[2], nd_m[1] = a1_m[1], nd_m[2] = a2_m[1])
for (n in 1:ns[s]) {
z_d[s,n,1:2] = multi_normal_rng(d_m[1:2], sigma_d[s,1:2, 1:2]);
z_nd[s,n,1:2] = multi_normal_rng(nd_m[1:2], sigma_nd[s, 1:2, 1:2]);
z_[s,n,1:2] = bernoulli_rng(p[s])*z_d[s,n,1:2] + z_nd[s,n,1:2]*bernoulli_rng(1-p[s]);
}
// posterior predictions
for (n in 1:ns[s]) {
for (i in 1:2) {
if (z_[s,n,i] > 0) {
y_hat[n,i,s] = 1; }
else {
y_hat[n,i,s] = 0; }
}
// 2x2 table for each individual
if (y_hat[n,1,s] == 1 && y_hat[n,2,s] == 1) {
ei[s,n,1] = 1; ei[s,n,2] = 0; ei[s,n,3] = 0; ei[s,n,4] = 0; }
if (y_hat[n,1,s] == 1 && y_hat[n,2,s] == 0) {
ei[s,n,1] = 0; ei[s,n,2] = 1; ei[s,n,3] = 0; ei[s,n,4] = 0; }
if (y_hat[n,1,s] == 0 && y_hat[n,2,s] == 1) {
ei[s,n,1] = 0; ei[s,n,2] = 0; ei[s,n,3] = 1; ei[s,n,4] = 0; }
if (y_hat[n,1,s] == 0 && y_hat[n,2,s] == 0) {
ei[s,n,1] = 0; ei[s,n,2] = 0; ei[s,n,3] = 0; ei[s,n,4] = 1; }
}
// construct model-predicted 2x2 tables for each study s
e[s,1] = sum(ei[s, 1:ns[s], 1]);
e[s,2] = sum(ei[s, 1:ns[s], 2]);
e[s,3] = sum(ei[s, 1:ns[s], 3]);
e[s,4] = sum(ei[s, 1:ns[s], 4]);
edit:
I am trying to work out how to reconstruct the 2x2 tables (for each study) to explore the model fit. Each cell represents the possible classification from two diagnostic tests (positive on test 1 and 2, negative on 1 and positive on 2, etc). The reason why it is important to reconstruct the 2x2 tables is to check the fit. One of these cells (“positive on test 1 and test 2”) is also needed to calculate the expected correlations between test 1 and test 2 results for each study, which require this joint probability to calculate. For each study s, these are given by:
\frac{Pr(Y_{1,s} = 1, Y_{2,s} = 1) - Pr(Y_{1,s} = 1)*Pr(Y_{2,s} = 1)}{\sqrt{Pr(Y_{1,s} = 1)*Pr(Y_{2,s} = 1)*(1-Pr(Y_{1,s} = 1))*(1-Pr(Y_{2,s} = 1))}}
Where Y_{i,s} represents the result of test i from study s
I was thinking, another way to reconstruct these 2x2 tables, as opposed to what I have tried to do here, is:
Pr(Y_{1,s} = 1, Y_{2,s} = 1) = p_{s} * (Se_{1}*Se_{2} + cov_{s,d=1}) + (1-p_{s}) * ((1-Sp_{1})*(1-Sp_{2}) + cov_{s,d=0})
Pr(Y_{1,s} = 1, Y_{2,s} = 1) = p_{s} * (Se_{1}*(1-Se_{2}) - cov_{s,d=1}) + (1-p_{s}) * ((1-Sp_{1})*(Sp_{2}) - cov_{s,d=0})
Pr(Y_{1,s} = 1, Y_{2,s} = 1) = p_{s} * ((1-Se_{1})*Se_{2} - cov_{s,d=1}) + (1-p_{s}) * (Sp_{1}*(1-Sp_{2}) - cov_{s,d=0})
Pr(Y_{1,s} = 1, Y_{2,s} = 1) = p_{s} * ((1-Se_{1})*(1-Se_{2}) + cov_{s,d=1}) + (1-p_{s}) * ((1-Sp_{1})*(1-Sp_{2}) - cov_{s,d=0})
Where p_{s} is the disease prevalence in study s, cov_{s,d} is the covariance term for disease group d in study s, Se_{i} and Sp_{i} are the sensitivity and specificity for test i.
Since
cov_{s,d=1} = \rho_{s,d=1}* \sqrt{Se_{1}*Se_{2} *(1-Se_{1})*(1-Se_{2})} , and
cov_{s,d=0} = \rho_{s,d=0}* \sqrt{Sp_{1}*Sp_{2} *(1-Sp_{1})*(1-Sp_{2})} ,
I could work out the 2x2 tables this way instead. The problem is, \rho_{s,d} are not the same as the correlations between the latent continuous variables from the multivariate probit model, but rather the correlations between the binary test result data. So, i’d work that out first in generated quantities (I don’t think Stan has a built in function to work out the correlation between two vectors?). I will try that out and update if it works.
@martinmodrak I originally had this edit in my reply to you on a different thread but moved it here where it is more relevant