Prevalence estimation model in presence of misclassification

fitting-issues
#1

This is probably something really simple but I cannot figure out the solution.
I’m getting the classic error: Initialization between (-2, 2) failed after 100 attempts.
Consider the following model:

modelEx = '
data {
int<lower=1> N;  // sample size
vector<lower=0,upper=1>[N] phat;  //apparent prevalence   
}

parameters {
real<lower=0,upper=1> se; // sensitivity 
real<lower=0,upper=1> sp; // specificity
}

transformed parameters{
vector<lower=0,upper=1>[N] p;
p = (phat - 1 + sp) / (se + sp - 1);
}

model {
// priors
se ~ beta(9,1); 
sp ~ beta(9,1);
}
'
# toy example
set.seed(20190123)
n = 30
p <- runif(n)
se <- 0.8
sp <- 0.9
phat <- p * se + (1 - p) * (1 - sp)

dat <- list(phat = phat, N = n)

inits <- function(){list(se = 0.70, sp = 0.80)}

output <- stan(model_code = modelEx,
                   init=inits,
                   pars = c('se','sp', 'p'), 
                   data = dat,
                   iter = 3000, 
                   warmup=500,
                   thin=1,
                   chains = 1,
                   verbose = T) 

I think the problem arises for small values of phat e.g. 0.01 when sp < 1 - phat, which yields a negative value for p (that is a probability).
Any idea how to solve this problem?

0 Likes

#2

Hi,
it seems that your model does not match your simulation. If I start with
phat = p * se + (1 - p) * (1 - sp)

I get that

p = (phat - (1-p) * (1-sp)) / se

which is IMHO not the same as what is in your model (you have p = (phat - 1 + sp) / (se + sp - 1))

0 Likes

#3

Hi Martin, thanks for your comment. In this Eq we have p in both sides. I think the actual Eq is well defined in the original post?!

0 Likes

#4

Oh, sorry, I was too quick to judge :-) Your derivation of p in the model is correct.

With that out of the way, the biggest problem is that you don’t connect your data in any way with the parameters, i.e. there is no likelihood. Note that p is not present in any sampling statement so has no effect on the sampler. It might be useful if you could elaborate more on where do your actual data come from and what are you trying to achieve.

A further problem is that you allow parameter values that do not correspond to valid configurations of your model (resulting in p outside of [0,1]). You correctly note that sp > 1 - phat is necessary to get p > 0. From the constraint p < 1 you get another bound: se > phat.
What you could do is to force those constraints on se and sp like this:

real<lower=max(phat),upper=1> se; // sensitivity 
real<lower=1 - max(phat),upper=1> sp; // specificity

This would however still not involve p in sampling.

0 Likes

#5

Hi Martin, Apologize for my delayed response.
Thank you so much for your suggestions. True about the missing likelihood.

Restricting sp does the trick. I used: real<lower=1-min(phat),upper=1> sp; // specificity
Brilliant!!! Thanks

0 Likes