# ROC modeling using placement values

I am trying to fit an ROC model in STAN that I had hard time to do in JAGS. Suppose n_0 healthy scores y_{0i} \sim F_0 and n_1 disease scores y_{1i} \sim F_1 where F_j are CDFs. An ROC model based on the idea of placement value (PV) is to define the placement value z_i = 1-F_0 (y_{1i}) and model it as z^*=h(z_i) = a + \epsilon_i where \epsilon_i follows a distribution with CDF G and h is a proper link function (e.g., an inverse CDF function). Since ROC is simply the CDF of z_i, it can be derived as ROC(t) = G(-a + h(t)).

Since z_i is unobserved and depends on both diseased scores y_{1i} and unknown parameters associated with health distribution F_0, this PV-based ROC model is not easy to fit in JAGS because z_i can’t be on left hand side twice. While STAN has no problem in this regard, it encounters divergence issue even in the simple binormal ROC case (y_{0i} \sim N(\mu_0,\sigma_0^2) and y_{1i} \sim N(\mu_1,\sigma_1^2)), as illustrated below.

First, the desired STAN code for the PV-based binormal model. Note in binormal model z_i^* has a simple form: z_i^*=\Phi^{-1}(z_i)=\Phi^{-1}(1-\Phi((y_{1i}-\mu_0)/\sigma_0)))=-(y_{1i}-\mu_0)/\sigma_0.

mycode1 = "
data {
int<lower=0> n0;
int<lower=0> n1;
real y0[n0];
real y1[n1];
}

parameters {
real mu0;
real<lower=0> sig0;
real alpha;
real<lower=0> beta;
}

transformed parameters {
real auc;
auc = Phi(alpha/sqrt(1+beta*beta));
}

model {
real zstar[n1];
target += normal_lpdf(mu0 | 0,10);
target += gamma_lpdf(sig0 |.01,.01);
target += normal_lpdf(alpha | 0,10);
target += gamma_lpdf(beta |.01,.01);
target += normal_lpdf(y0 | mu0,sig0);
for (j in 1:n1) {
zstar[j] = -(y1[j]-mu0)/sig0;
}
target += normal_lpdf(zstar | -alpha,beta);
}
"


I generated a dataset in R and ran mymodel1 against it:

n1=50;n0=50;mu0=0.5;sig0=.5;mu1=1.5;sig1=1
a=(mu1-mu0)/sig1;b=sig0/sig1
y0=rnorm(n0,mu0,sig0)
y1=rnorm(n1,mu1,sig1)
auc=pnorm(a/sqrt(1+b*b)) # True AUC=.8144
mydata=list(n0=n0,n1=n1,y0=y0,y1=y1)

fit1 <- stan(model_code = mycode1, data = mydata, iter = 1000, chains = 4, control = list(adapt_delta = 0.8, max_treedepth = 10))


These are the error messages and estimates:

1: There were 40 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
2: Examine the pairs() plot to diagnose sampling problems

print(fit1,pars=c("auc"))

Inference for Stan model: 5d94ea74015401e0c0cb98f838a8f712.
4 chains, each with iter=1000; warmup=500; thin=1;
post-warmup draws per chain=500, total post-warmup draws=2000.

mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
auc 0.56    0.01 0.08 0.42 0.51 0.55 0.61  0.73   164 1.01

Samples were drawn using NUTS(diag_e) at Tue Oct 29 15:42:58 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).


I changed adapt_delta to 0.99 and max_treedepth to 20 and results were similar. Clearly the estimated AUC (0.56) is far away from the truth (0.81).

I then ran a second code as follows:

mycode2 = "
data {
int<lower=0> n0;
int<lower=0> n1;
real y0[n0];
real y1[n1];
}

parameters {
real mu0;
real<lower=0> sig0;
real alpha;
real<lower=0> beta;
}

transformed parameters {
real auc;
auc = Phi(alpha/sqrt(1+beta*beta));
}

model {
real zstar[n1];
target += normal_lpdf(mu0 | 0,10);
target += gamma_lpdf(sig0 |.01,.01);
target += normal_lpdf(alpha | 0,10);
target += gamma_lpdf(beta |.01,.01);
target += normal_lpdf(y0 | mu0,sig0);
for (j in 1:n1) {
zstar[j] = -(y1[j]-mu0);
}
target += normal_lpdf(zstar | -alpha*sig0,beta*sig0);
}
"


Note that the model in mycode2 is equivalent to that in mycode1; I simply multiplied both sides of the normal model in the last line by \sigma_0. STAN had no problem with this new code and produced reasonable AUC estimate (0.84)

fit2 <- stan(model_code = mycode2, data = mydata, iter = 1000, chains = 4, control = list(adapt_delta = 0.8, max_treedepth = 10))

print(fit2,pars=c("auc"))

4 chains, each with iter=1000; warmup=500; thin=1;
post-warmup draws per chain=500, total post-warmup draws=2000.

mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
auc 0.84       0 0.04 0.75 0.81 0.84 0.87   0.9  1340    1

Samples were drawn using NUTS(diag_e) at Tue Oct 29 15:42:21 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).


Although mycode2 worked, it is not helpful to me, as in almost all non-binormal ROC models I can’t simplify z^* and don’t have the convenience to move some parameters around. In those general cases, the for-loop in mycode1 will be something like

  for (j in 1:n1) {
zstar[j] = h(1-F_0(y1[j];\theta);
}


where F_0 can be the CDF of a non-central \chi^2 with parameters \theta and h the inverse of another CDF. As such it is important that I can run mycode1 in STAN and get similar results as with mycode2. The question is how to improve the convergence of mycode2. Any suggestions will be highly appreciated.

In case you are wondering why not follow the conventional approach to model y_{0i} and y_{1i}: the PV-based approach allows one to assess covariate effect on ROC directly, something the conventional approach can’t do.

Sorry, can’t answer now, but maybe @syclik is not busy and can answer?

@ZChen, welcome to the forums!

I’m a bit confused. I tried following the math, connecting it to the Stan program, then looking at the way you’ve generated data. Mind helping me understand?

• In how you’re generating data, you’ve included mu1 and sig1, but these aren’t parameters in your model. Is that intentional?

• It looks like the auc can be moved to generated quantities. Is there a reason that’s not done here?

2 Likes

The result of the following code has more ill condition, the problem is not only the divergent transition. The R hat criteria dose not hold.

Generally speaking, the ROC model has two kinds of parameters, one is for the signal distribution and the other is the noise distribution. But one of them is redundant and we can omit by normalizing.
Note that, in this situation, the AUC is a function of these parameters.

I am not sure your model definition, but I assume the above.
Then I can point out the following issues about your Stan file.

1. issue I
alpha and beta should be generated parameters.
So alpha and beta are not model parameter, but functions of model parameter.
Please use transformed parameters block, and give the definition of them using sig1 and mu1. sig0 and mu0. I am not sure but something like the following.

\alpha = (\mu_1-\mu_0)/\sigma_o
\beta = \sigma_1 / \sigma_0

1. issue II
Further more, in the Stan file, sig1 and mu0 is not appeard.
I think in the stan file, they are normalized so that sig1 = 1 and mu1 = 0 which are not your specified numbers.
If the model use sig1 and mu1 without normalizations, then these fixed parameter should be included in the data block or parameter block of the stan file. If eliminates these parameter from the theory, then the data should be generated according to this normalization, which should have compatibility with your current list of truth data.
I do not recommend the model which use both signal and noise parameters, since such redundant parameters will cause some non-convergent issues.

So, the description of model is definitely not correct.

  set.seed(123456)

n1=50;n0=50;mu0=0.5;sig0=.5;mu1=1.5;sig1=1
a=(mu1-mu0)/sig1;b=sig0/sig1
y0=rnorm(n0,mu0,sig0)
y1=rnorm(n1,mu1,sig1)
auc=pnorm(a/sqrt(1+b*b)) # True AUC=.8144
mydata=list(n0=n0,n1=n1,y0=y0,y1=y1)

mycode1 = "
data {
int<lower=0> n0;
int<lower=0> n1;
real y0[n0];
real y1[n1];
}

parameters {
real mu0;
real<lower=0> sig0;
real alpha;
real<lower=0> beta;
}

transformed parameters {
real auc;
auc = Phi(alpha/sqrt(1+beta*beta));
}

model {
real zstar[n1];
target += normal_lpdf(mu0 | 0,10);
target += gamma_lpdf(sig0 |.01,.01);
target += normal_lpdf(alpha | 0,10);
target += gamma_lpdf(beta |.01,.01);
target += normal_lpdf(y0 | mu0,sig0);
for (j in 1:n1) {
zstar[j] = -(y1[j]-mu0)/sig0;
}
target += normal_lpdf(zstar | -alpha,beta);
}
"

fit1 <- stan(model_code = mycode1,
data = mydata,
iter =  3000,
warmup =  222,
chains = 1, control = list(adapt_delta = 0.8, max_treedepth = 10),
seed = 123456789)

traceplot(fit1)