Prevalence estimation model in presence of misclassification

Hemingway · January 23, 2019, 5:23am

This is probably something really simple but I cannot figure out the solution.
I’m getting the classic error: Initialization between (-2, 2) failed after 100 attempts.
Consider the following model:

modelEx = '
data {
int<lower=1> N;  // sample size
vector<lower=0,upper=1>[N] phat;  //apparent prevalence   
}

parameters {
real<lower=0,upper=1> se; // sensitivity 
real<lower=0,upper=1> sp; // specificity
}

transformed parameters{
vector<lower=0,upper=1>[N] p;
p = (phat - 1 + sp) / (se + sp - 1);
}

model {
// priors
se ~ beta(9,1); 
sp ~ beta(9,1);
}
'

# toy example
set.seed(20190123)
n = 30
p <- runif(n)
se <- 0.8
sp <- 0.9
phat <- p * se + (1 - p) * (1 - sp)

dat <- list(phat = phat, N = n)

inits <- function(){list(se = 0.70, sp = 0.80)}

output <- stan(model_code = modelEx,
                   init=inits,
                   pars = c('se','sp', 'p'), 
                   data = dat,
                   iter = 3000, 
                   warmup=500,
                   thin=1,
                   chains = 1,
                   verbose = T)

I think the problem arises for small values of phat e.g. 0.01 when sp < 1 - phat, which yields a negative value for p (that is a probability).
Any idea how to solve this problem?

martinmodrak · February 4, 2019, 5:41pm

Hi,
it seems that your model does not match your simulation. If I start with
phat = p * se + (1 - p) * (1 - sp)

I get that

p = (phat - (1-p) * (1-sp)) / se

which is IMHO not the same as what is in your model (you have p = (phat - 1 + sp) / (se + sp - 1))

Hemingway · February 5, 2019, 1:22am

Hi Martin, thanks for your comment. In this Eq we have p in both sides. I think the actual Eq is well defined in the original post?!

martinmodrak · February 5, 2019, 7:56am

Oh, sorry, I was too quick to judge :-) Your derivation of p in the model is correct.

With that out of the way, the biggest problem is that you don’t connect your data in any way with the parameters, i.e. there is no likelihood. Note that p is not present in any sampling statement so has no effect on the sampler. It might be useful if you could elaborate more on where do your actual data come from and what are you trying to achieve.

A further problem is that you allow parameter values that do not correspond to valid configurations of your model (resulting in p outside of [0,1]). You correctly note that sp > 1 - phat is necessary to get p > 0. From the constraint p < 1 you get another bound: se > phat.
What you could do is to force those constraints on se and sp like this:

real<lower=max(phat),upper=1> se; // sensitivity 
real<lower=1 - max(phat),upper=1> sp; // specificity

This would however still not involve p in sampling.

Hemingway · March 22, 2019, 4:51am

Hi Martin, Apologize for my delayed response.
Thank you so much for your suggestions. True about the missing likelihood.

Restricting sp does the trick. I used: real<lower=1-min(phat),upper=1> sp; // specificity
Brilliant!!! Thanks

Topic		Replies	Views
Initialization between (-2, 2) failed after 100 attempts Modeling rstan , fitting-issues	3	946	July 6, 2023
Initialization between (-2, 2) failed after 100 attempts in Bass model fitting Modeling	3	692	March 14, 2020
Initialization between (-2, 2) failed after 100 attempts in Negative Binomial Distribution Modeling bioinformatics	6	1631	March 15, 2020
Initialization between (-2, 2) failed after 100 attempts Modeling specification	4	406	May 22, 2023
Initialization between (-2, 2) failed after 100 attempts Modeling fitting-issues	3	606	January 30, 2021

Prevalence estimation model in presence of misclassification

Related topics