I am trying to fit an ROC model in STAN that I had hard time to do in JAGS. Suppose n_0 healthy scores y_{0i} \sim F_0 and n_1 disease scores y_{1i} \sim F_1 where F_j are CDFs. An ROC model based on the idea of placement value (PV) is to define the placement value z_i = 1-F_0 (y_{1i}) and model it as z^*=h(z_i) = a + \epsilon_i where \epsilon_i follows a distribution with CDF G and h is a proper link function (e.g., an inverse CDF function). Since ROC is simply the CDF of z_i, it can be derived as ROC(t) = G(-a + h(t)).
Since z_i is unobserved and depends on both diseased scores y_{1i} and unknown parameters associated with health distribution F_0, this PV-based ROC model is not easy to fit in JAGS because z_i can’t be on left hand side twice. While STAN has no problem in this regard, it encounters divergence issue even in the simple binormal ROC case (y_{0i} \sim N(\mu_0,\sigma_0^2) and y_{1i} \sim N(\mu_1,\sigma_1^2)), as illustrated below.
First, the desired STAN code for the PV-based binormal model. Note in binormal model z_i^* has a simple form: z_i^*=\Phi^{-1}(z_i)=\Phi^{-1}(1-\Phi((y_{1i}-\mu_0)/\sigma_0)))=-(y_{1i}-\mu_0)/\sigma_0.
mycode1 = "
data {
int<lower=0> n0;
int<lower=0> n1;
real y0[n0];
real y1[n1];
}
parameters {
real mu0;
real<lower=0> sig0;
real alpha;
real<lower=0> beta;
}
transformed parameters {
real auc;
auc = Phi(alpha/sqrt(1+beta*beta));
}
model {
real zstar[n1];
target += normal_lpdf(mu0 | 0,10);
target += gamma_lpdf(sig0 |.01,.01);
target += normal_lpdf(alpha | 0,10);
target += gamma_lpdf(beta |.01,.01);
target += normal_lpdf(y0 | mu0,sig0);
for (j in 1:n1) {
zstar[j] = -(y1[j]-mu0)/sig0;
}
target += normal_lpdf(zstar | -alpha,beta);
}
"
I generated a dataset in R and ran mymodel1 against it:
n1=50;n0=50;mu0=0.5;sig0=.5;mu1=1.5;sig1=1
a=(mu1-mu0)/sig1;b=sig0/sig1
y0=rnorm(n0,mu0,sig0)
y1=rnorm(n1,mu1,sig1)
auc=pnorm(a/sqrt(1+b*b)) # True AUC=.8144
mydata=list(n0=n0,n1=n1,y0=y0,y1=y1)
fit1 <- stan(model_code = mycode1, data = mydata, iter = 1000, chains = 4, control = list(adapt_delta = 0.8, max_treedepth = 10))
These are the error messages and estimates:
1: There were 40 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
2: Examine the pairs() plot to diagnose sampling problems
print(fit1,pars=c("auc"))
Inference for Stan model: 5d94ea74015401e0c0cb98f838a8f712.
4 chains, each with iter=1000; warmup=500; thin=1;
post-warmup draws per chain=500, total post-warmup draws=2000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
auc 0.56 0.01 0.08 0.42 0.51 0.55 0.61 0.73 164 1.01
Samples were drawn using NUTS(diag_e) at Tue Oct 29 15:42:58 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
I changed adapt_delta to 0.99 and max_treedepth to 20 and results were similar. Clearly the estimated AUC (0.56) is far away from the truth (0.81).
I then ran a second code as follows:
mycode2 = "
data {
int<lower=0> n0;
int<lower=0> n1;
real y0[n0];
real y1[n1];
}
parameters {
real mu0;
real<lower=0> sig0;
real alpha;
real<lower=0> beta;
}
transformed parameters {
real auc;
auc = Phi(alpha/sqrt(1+beta*beta));
}
model {
real zstar[n1];
target += normal_lpdf(mu0 | 0,10);
target += gamma_lpdf(sig0 |.01,.01);
target += normal_lpdf(alpha | 0,10);
target += gamma_lpdf(beta |.01,.01);
target += normal_lpdf(y0 | mu0,sig0);
for (j in 1:n1) {
zstar[j] = -(y1[j]-mu0);
}
target += normal_lpdf(zstar | -alpha*sig0,beta*sig0);
}
"
Note that the model in mycode2 is equivalent to that in mycode1; I simply multiplied both sides of the normal model in the last line by \sigma_0. STAN had no problem with this new code and produced reasonable AUC estimate (0.84)
fit2 <- stan(model_code = mycode2, data = mydata, iter = 1000, chains = 4, control = list(adapt_delta = 0.8, max_treedepth = 10))
print(fit2,pars=c("auc"))
Inference for Stan model: 6807fa650cead5a63be26c60f1353b76.
4 chains, each with iter=1000; warmup=500; thin=1;
post-warmup draws per chain=500, total post-warmup draws=2000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
auc 0.84 0 0.04 0.75 0.81 0.84 0.87 0.9 1340 1
Samples were drawn using NUTS(diag_e) at Tue Oct 29 15:42:21 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Although mycode2 worked, it is not helpful to me, as in almost all non-binormal ROC models I can’t simplify z^* and don’t have the convenience to move some parameters around. In those general cases, the for-loop in mycode1 will be something like
for (j in 1:n1) {
zstar[j] = h(1-F_0(y1[j];\theta);
}
where F_0 can be the CDF of a non-central \chi^2 with parameters \theta and h the inverse of another CDF. As such it is important that I can run mycode1 in STAN and get similar results as with mycode2. The question is how to improve the convergence of mycode2. Any suggestions will be highly appreciated.
In case you are wondering why not follow the conventional approach to model y_{0i} and y_{1i}: the PV-based approach allows one to assess covariate effect on ROC directly, something the conventional approach can’t do.