# I want to hear any opinion about my new model from the signal detection theory

I want to remove the following issues from the following model ( I give a minimal code in which the following issue will appear).

• the divergent transitions ,

• get convergence chains for any seed. ( namely in some seed, the non-convergent chains appear or in very long chains, the sub chain is not converged in the sense of the R hat criterion)

I welcome any questions. Because I made a very brief explanation without any important theory, I think the following explanation is difficult to read. Also my English is poor. I am the author of the package BayesianFROC in which the following model is implemented.

I made a model to compare imaging modaity (MRI, CT, PET,etc) in Radiological context.
For example, this model tells us that the radiologist can find more many lesions in images taken by MRI than CT.

Trial to get data consisting of TPs and FPs.

• Suppose that there are N_I radiographs (images), in each of which there are several shadows caused by lesions or noises. We assume that there are N_L lesions, one image may contain no image and one image may contain one more lesions. Each images are taken by MRI, CT, or PET, etc. Now assume that there are M modalities.

• Suppose that there are R readers. The readers try to find lesions in each image.
For each image, if a reader thinks there are lesions, then the reader localizes his suspicious locations with his confidence levels. For example, if reader localizes 3 positions in a single image, then reader also assigns to each suspicious location a confidence level which is an integer such as 5= “definitely present”, 4=“probably present”, 3=…, 1=“questionable”.

• After reader’s lesion finding task, some data analyst counts the number of true positive (hits) and the number of false positives (false alarms) .

So, we will get the following dataset

Data

Two readers and two modalities and three kind of confidence levels.

Confidence Level Modality ID Reader ID Number of Hits Number of False alarms
3 = definitely present 1 1 H_{3,1,1} F_{3,1,1}
2 = equivocal 1 1 H_{2,1,1} F_{2,1,1}
1 = questionable 1 1 H_{1,1,1} F_{1,1,1}
3 = definitely present 1 2 H_{3,1,2} F_{3,1,2}
2 = equivocal 1 2 H_{2,1,2} F_{2,1,2}
1 = questionable 1 2 H_{1,1,2} F_{1,1,2}
3 = definitely present 2 1 H_{3,2,1} F_{3,2,1}
2 = equivocal 2 1 H_{2,2,1} F_{2,2,1}
1 = questionable 2 1 H_{1,2,1} F_{1,2,1}
3 = definitely present 2 2 H_{3,2,2} F_{3,2,2}
2 = equivocal 2 2 H_{2,2,2} F_{2,2,2}
1 = questionable 2 2 H_{1,2,2} F_{1,2,2}

To give the probability law of the two kinds of random variables, I made a model as follows.

Model

H_{c,m,r} \sim \text{Binomial}(p_{c,m,r}(\theta),N_L(c,\mathcal{H}_{c,m,r} )),\\ F_{c,m,r} \sim \text{Poisson}(q_c(\theta)N_X).\\ p_{c,m,r}(\theta) := \int_{\theta_c}^{\theta_{c+1}}\text{Gaussian}_{}(x|\mu_{m,r},\sigma_{m,r})dx\\ =\Phi (\frac{\theta_{c +1}-\mu_{m,r}}{\sigma_{m,r}})-\Phi (\frac{\theta_{c}-\mu_{m,r}}{\sigma_{m,r}}),\\ q_c(\theta) := \int_{\theta_c}^{\theta_{c+1}} \frac{d \log \Phi(z)}{dz}dz\\ = \log \frac{ \Phi(\theta_{c+1}) }{ \Phi(\theta_{c}) }\\ q_{C} = - \log \Phi ( \theta_{C} ), \\ p_C = 1-\Phi (\frac{\theta_{C}-\mu_{m,r}}{\sigma_{m,r}}),\\ A_{m,r} := \Phi (\frac{\mu_{m,r}/\sigma_{m,r}}{\sqrt{(1/\sigma_{m,r})^2+1}}), \\ A_{m,r} \sim \text{Normal} (A_{m},\sigma_{r}^2), \\

Where the model parameter is \theta = (\theta_c,\mu_{m,r},\sigma_{m,r},A_m) and 1\leq c \leq C, 1\leq m \leq M, 1\leq r \leq R, and \Phi denote the cumulative distribution functions of the canonical Gaussian. Note that \theta_{C+1} = \infty and \mathcal{H}_{c,m,r} := \{ H_{c+1,m,r},H_{c+2,m,r},\cdots,H_{C,m,r}\} and

N_L(c,\mathcal{H}_{c,m,r} ) := N_L- \sum_{c'=c+1}^C H_{c',m,r} ,c=1,\cdots, C-1,\\ N_L(C,\mathcal{H}_{C,m,r} ) := N_L.\\

Set d\theta_c = \theta_{c+1}-\theta_{c}.

We use the following priors:

d\theta_c, \sigma_{m,r} \sim \text{Uniform}(0,111),\\ \theta_{1},\mu_{m,r} \sim \text{Uniform}( -111,111),\\ A_{m} \sim \text{Uniform}(0,1).\\

Implementation

# Make a dataset

# modality ID
m <-c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3
,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5
,5,5,5,5,5,5,5,5,5,5,5,5)

q <-c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,1,1,1,1
,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,1,1,1,1,1,2,2,2
,2,2,3,3,3,3,3,4,4,4,4,4)

# confidence level

c<-c(5,4,3,2,1,5,4,3,2,1,5,4,3,2,1,5,4,3,2,1,5,4,3,2,1,5,4,3,2,1,5,4,3,2,1,5,4,3,2,1,5,4,3,2
,1,5,4,3,2,1,5,4,3,2,1,5,4,3,2,1,5,4,3,2,1,5,4,3,2,1,5,4,3,2,1,5,4,3,2,1,5,4,3,2,1,5,4,3
,2,1,5,4,3,2,1,5,4,3,2,1)

#  FP  ( false alarm)

f<-c(
0,4,20,29,21,0,0,6,15,22,1,15,18,31,19,1,2,4,16,17,1,1,21,24,23,1,1,5,30
,40,2,19,31,56,42,2,0,2,30,32,1,7,13,28,19,0,1,7,7,31,7,15,28,41,9,0,2,5
,24,31,1,4,18,21,23,1,1,0,11,35,6,14,37,36,18,0,2,4,18,25,0,2,19,23,18,0,2
,6,10,30,2,25,40,29,24,1,1,4,24,32
)

#  TP (hit)
h<-c(
50,30,11,5,1,15,29,29,1,0,39,31,8,10,3,10,8,25,45,14,52,25,13,4,1,27,28,29,1
,0,53,29,13,2,4,9,16,22,43,14,43,29,11,6,0,18,29,21,0,0,43,29,6,7,1,10,14,19
,32,23,61,19,12,9,3,16,29,34,1,0,52,29,10,4,3,10,16,23,43,15,35,29,18,9,0,17,27
,24,0,0,34,33,7,13,2,12,16,21,35,15
)

C<-5  # the number of confidence levels
M<-5 # the number of modalities
Q<-4 # the number of readers
NI<-199 # the number of images
NL<-142 # the number of lesions

# the length of the dataset
N <-C*M*Q

# make an array format hits data
ff <- numeric(N) #Initialization of Cumulative False alarm
harray<-array(0,dim=c(C,M,Q));

for(md in 1:M) {
for(cd in 1:C) {
for(qd in 1 : Q){
for(n  in 1:cd){
ff[cd+(md-1)*C*Q+(qd-1)*C]<-ff[cd+(md-1)*C*Q+(qd-1)*C]+f[n+(md-1)*C*Q+(qd-1)*C]
}
harray[cd,md,qd] <- h[cd+(md-1)*C*Q+(qd-1)*C]
}}}

# make a data to be passed to sampling()

data <- list(N=N,Q=Q, M=M,m=m  ,C=C  , NL=NL,NI=NI
,c=c,q=q,
h=h, f=f,
ff=ff,
harray=harray
)

# Make a Stan model

Stan.model <- rstan::stan_model( model_code="

data{
int <lower=0>N;
int <lower=0>M;
int <lower=0>C;
int <lower=0>Q;
int <lower=0>h[N];
int <lower=0>f[N];
int <lower=0>q[N];
int <lower=0>c[N];
int <lower=0>m[N];
int <lower=0>NL;
int <lower=0>NI;

int <lower=0>ff[N];
int <lower=0>harray[C,M,Q];

}
transformed data {
int <lower=0> NX;
NX = NI;
}

parameters{
real    w;
real <lower =0  >  dz[C-1];
real               mu[M,Q];
real <lower=0>      v[M,Q];
real <lower=0>      hyper_v[Q];
real <lower=0,upper=1>A[M];

}

transformed parameters {
real <lower =0>       dl[C];
real <lower=0,upper=1> ppp[C,M,Q];
real <lower =0>      l[C];
real    z[C];
real                      aa[M,Q];
real <lower =0>           bb[M,Q];
real <lower=0,upper=1>    AA[M,Q];
real deno[C-1,M,Q];
real hit_rate[C,M,Q];

z[1]=w;

for(md in 1 : M) {
for(qd in 1 : Q) {
aa[md,qd]=mu[md,qd]/v[md,qd];
bb[md,qd]=1/v[md,qd];

for(cd in 1 : C-1) z[cd+1] = z[cd] + dz[cd];
ppp[C,md,qd] = 1- Phi((z[C] -mu[md,qd])/v[md,qd]);

for(cd in 1 : C-1) ppp[cd,md,qd] = Phi((z[cd+1] -mu[md,qd])/v[md,qd])  - Phi((z[cd ] -mu[md,qd])/v[md,qd]);

for(cd in 1 : C) l[cd] = (-1)*log(Phi(z[cd]));
dl[C] = fabs(l[C]-0);
for(cd in 1:C-1) dl[cd]= fabs(l[cd]-l[cd+1]);

}
}

for(md in 1 : M) {
for(qd in 1 : Q) {
AA[md,qd]=Phi(  (mu[md,qd]/v[md,qd])/sqrt((1/v[md,qd])^2+1)  );//Measures of modality performance
}}

for(md in 1 : M) {
for(qd in 1 : Q) {
deno[C-1,md,qd]=1-ppp[C,md,qd];
for(cd in 3:C){  deno[c[cd],md,qd]=deno[c[cd-1],md,qd]-ppp[c[cd-1],md,qd];  }
}}

for(md in 1 : M) {
for(qd in 1 : Q) {
for(cd in 1:C-1){
hit_rate[cd,md,qd]=ppp[cd,md,qd]/deno[cd,md,qd];
}
hit_rate[C,md,qd]=ppp[C,md,qd];

}}

}

model{
int s=0;

for(qd in 1 : Q) {
for(md in 1 : M) {
target += normal_lpdf( AA[md,qd]|A[md],hyper_v[qd]);
}  }
for(n in 1:N) {
target +=   poisson_lpmf(ff[n]|l[c[n]]*NX);
}

for(qd in 1 : Q) {
for(md in 1 : M) {
s=0;
for(cd in 1 : C){
target += binomial_lpmf(harray[cd,md,qd]  |  NL-s, hit_rate[c[cd],md,qd]  );
s = s + harray[cd,md,qd]; }
}}

w ~  uniform(-3,3);
for(cd in 1:C-1) dz[cd] ~  uniform(0.001,7);
for(md in 1 : M) { for(qd in 1 : Q) {
mu[md,qd] ~ uniform(-11,11);
v[md,qd] ~ uniform(0.01,11);

}}

}

")

#  Fit a model to data

fit  <-  rstan::sampling(
object= Stan.model, data=data,  verbose=F,
seed=1234567, chains=1, warmup=111, iter=1111
, control = list(adapt_delta = 0.9999999,
max_treedepth = 15)
# ,init = initial
)

rstan::traceplot(fit,pars=c("w"))
rstan::check_hmc_diagnostics(fit)

# MCMC fails if the seed is changed.

fit  <-  rstan::sampling(
object= Stan.model, data=data,  verbose=F,
seed=1, chains=1, warmup=111, iter=122
, control = list(adapt_delta = 0.9999999,
max_treedepth = 15)
# ,init = initial
)

rstan::traceplot(fit,pars=c("w"))
rstan::check_hmc_diagnostics(fit)
1. What happens if you make the priors normal(0, 1) instead of those wide uniforms?

2. If you use a tiny init (init = 0.2 by default), does the model diverge less?

3. When chains don’t converge, is there any signature in this in terms of parameters? Do some parameters just go to zero? Or does something get really large? Etc.

4. Do the divergences happen in any particular situation?

2 Likes

What happens if you make the priors normal(0, 1) instead of those wide uniforms?

I had tried such prior with a small varinace but it worsens model fitting.

(I mistyped in my previous post, the R code (in the previous post) was the model with a prior of small variance as follows

d\theta_c, \sigma_{m,r} \sim \text{Uniform}(0.001,7),\\ \theta_{1} \sim \text{Uniform}( -3,3),\\ \mu_{m,r} \sim \text{Uniform}( -11,11),\\ A_{m} \sim \text{Uniform}(0,1).\\

)

If you use a tiny init (init = 0.2 by default), does the model diverge less?

I am not familiar with the init.
Does init specify the slot @inits in the stanfit object by hand?
If parameter of model is a vector (\theta_1,...,\theta_n), then … what does the init = 0.2 mean?

I have not tried yet.
I am not sure the suitable init.

I guess stan automatically select the init from priors specified by Stan file, is it wrong?

When chains don’t converge, is there any signature in this in terms of parameters? Do some parameters just go to zero? Or does something get really large? Etc.

• Constant chains appear for all parameters, (the reproducible example is given in the previous post),

• In other example, something like the following (Code is give in the following with my previous posted model).

# with my previous model.

fit  <-  rstan::sampling(
object= Stan.model, data=data,  verbose=F,
seed=12345678, chains=1, warmup=111, iter=1222
, control = list(adapt_delta = 0.9999999,
max_treedepth = 15)
# ,init = initial
)

stan_trace(fit,pars = "ppp[2,3,4]")
rstan::check_hmc_diagnostics(fit)


Do the divergences happen in any particular situation?

In another model, the divergent transition occurs depending on data.
However, in this model, the divergent transitions occur for any dataset with sufficiently large MCMC samples.
If number of readers and number of images is large such as R=33,M=2 then the sampling also fails caused initialization.

There is another issue.
When I run rstan::sampling() with the above model,
sometimes the following error occurs:

Just to be clear, does it make the predictions of the model worse? Or does it make the sampling worse?

Parameters in Stan are initialized on the unconstrained space to numbers uniformly randomly chosen in [-2.0, 2.0].

If you set init = 0.2, that range changes to [-0.2, 0.2]. It can make the initialization less extremely bad and increase your chances of making it to the typical set.

Hmm, I see. Well that means things are totally breaking down, but it’s not clear where.

Do you have runs for which you get convergence? Can you compare the differences in parameters between these two models and get anywhere?

Otherwise my suggestion is try to simplify things by:

1. Using fake data
2. Setting some parameters to the correct values (you know the correct values cause you generated the data)
3. Trying to figure out a minimum set of parameters that gives problems (and working on the problem from there)

Also you might replace Phi with Phi_approx. The second is an approximation and I think the numerics are more stable. But that’s just guessing.

1 Like

Just to be clear, does it make the predictions of the model worse? Or does it make the sampling worse?

The predictions.

If you set init = 0.2,

I tried  init = 0.2 as follows. However, it’s not working.

    fit  <-  rstan::sampling(
object= Stan.model, data=data,  verbose=F,
seed=1, chains=1, warmup=111, iter=122
, control = list(adapt_delta = 0.9999999,
max_treedepth = 15)
,init = 0.2 # <-------------- -------here I set init = 0.2
)


Do you have runs for which you get convergence?

Yes, I do. In the first post, I showed the convergence example ( I reprint in the following)

# A convergence seed
fit  <-  rstan::sampling(
object= Stan.model, data=data,  verbose=F,
seed=1234567, chains=1, warmup=111, iter=1111
, control = list(adapt_delta = 0.9999999,
max_treedepth = 15)
# ,init = initial
)

rstan::traceplot(fit,pars=c("w"))
rstan::check_hmc_diagnostics(fit)


I understand your procedure to find the parameters that causes the problem.