Gibbs Sampling in Stan

To implement an MCMC scheme you need access to the previous state. The Stan language does not provide this (because that doesn’t make sense if your goal is to implement a log-density).

Sorry to uncheck @anon75146577’s answer, but there’s an example of how to code exactly this model in the latent discrete parameters chapter of the users guide. You can find this example and others in my latest paper, all coded in Stan.

Gibbs is actually a very bad way to fit these models—it’s super slow to converge. These models used to take 24 hours to fit in WinBUGS with very poor mixing and they now fit in like 30 minutes in Stan. Just be careful to use reasonable inits because there’s a non-identifiability. Duco Veen’s visiting us at Columbia from Utrecht and working on a case study that should be out soon.

P.S. Coincidentally, it was exactly this form of model that led me back from industry to academia. Andrew Gelman and Jennifer Hill helped me formulate the model (turns out Dawid and Skene beat me to the punch by a few decades), but I didn’t understand enough about Bayesian modeling and MCMC inference to diagnose what was going wrong with fitting it. I could fit in EM no problem (and have code on the web in both R and Python for doing that—it’s how Dawid and Skene originally fit these models).

1 Like

Hi Bob,

just a quick follow-up. Am i correct that the way to fit the Joseph 1995 model in Stan is to take the likelihood function from the paper, calculate it for all potential values of Y1 and Y2 (the two discrete states in the model) and sum it up?

Best, felix

Hi @Bob_Carpenter,

could you briefly comment on this? That would be great! I have trouble seeing the similarities between the Joseph and the David/Skene model :/

Best, Felix

I agree with Bob. This is a perfect model for Stan. Sure, you can’t do Gibbs sampling in Stan, but the goal here is not to do Gibbs sampling, the goal is to fit the model. You can fit the model in Stan so easily, as Bob explains.

@Bob_Carpenter Thanks for commenting and reviving this problem. I’ve taken a look at the DS.stan file in your linked paper. I’m not sure how to map my problem onto your code. Could you please explain how I could code up my model in Stan?

This is my try, which doesn’t give reasonable results unfortunately.

data{
  int cases;
  int n;
}

parameters{
  real<lower=0.5, upper=1> sens;
  real<lower=0.5, upper=1> spec;
  real<lower=0, upper=1> prev;
}

transformed parameters{
  real loglik;
  loglik = 0;
  
  for (i in 0:cases){
    for (j in 0:(n-cases)){
      loglik += 
        (i+j)*log(prev) + (n-i-j)*log(1-prev) + i*log(sens) + j*log(1-sens) + (n-cases-j)*log(spec) + (cases-i)*log(1-spec) + //loglikelihood from joseph et al 1995
        binomial_lpmf(i|cases, prev*sens/(prev*sens+(1-prev)*(1-spec))) +  //probability of i true positives out of observed cases
        binomial_lpmf(j|(n-cases), (prev*(1-sens))/(prev*(1-sens)+(1-prev)*(spec))); //probability of i false negatives out of observed noncases
    }
  }
}

model{
 sens ~ beta(90, 10);
 spec ~ beta(90, 10);
 prev ~ beta(10, 90);
 
 target  += loglik;
}

Take a look at this paper: http://www.stat.columbia.edu/~gelman/research/unpublished/specificity.pdf

2 Likes

Meaning that Gibbs could handle sampling for it? It helps to separate the model you want to fit and the algorithm you use to fit it.

Yes. That’s beause if you have p(\theta, u, v) for discrete variables u and v, then

p(\theta) = \sum_{m} \sum_n p(\theta, u = m, v = n).

The Dawid-Skene model is for the diagnostic tests with unknown gold standards. It’s not designed to estimate prevalence, but it has a prevalence parameter. It assumes the tests are categorical.

Ah, that’s because on the log scale you need log-sum-exp to sum, so it should be

target += log_sum_exp(loglik);

I’d also suggest rewriting the likelihood from the paper using regular density functions.

You might want to look at how Andrew and I coded this problem. We did it on the log odds scale, but it could’ve just as easily have been done with beta-binomial.

1 Like

thanks @Bob_Carpenter and @andrewgelman for looking into this! i read your sens/spec paper and it is intuitive that modeling the frequencies is easier/more straightforward than this approach. also, summation seems to become infeasible with relatively small n (like in the thousands) already.

i am looking into it because i’d like to run a regression on the latent states - could this be done differently? like modeling state probabilities?

Do you mean something like this?

If yes I can share that presentation and also some text/advice and possibly Stan and INLA code.

1 Like

would be great if you could share the presentation! my email is felix.fischer@charite.de

Thanks! Felix

I took a look at the paper @andrewgelman linked. It seems like the model presented in A.1 of that paper is well suited for the problem under the One Diagnostic Test header in my linked paper. Andrew and Bob’s paper continue to expand that model to cases where information of specificity and sensitivity are given by previous trials, and by previous studies (models A.2 and A.3 respectively).

If I’m not mistaken, I don’t think those models are not quite appropriate for my problem. In my problem, I have used two tests on a single sample of people. I can thus partition my sample into each permutation for the result of each of the two tests. Though I have information on the sens/spec of the form amenable to model A.1, I think the likelihood would be different. Am I correct in believing so? Am I to believe that @Bob_Carpenter 's model (the Dawid and Skene model) uses the appropriate likelihood? If I’m wrong, can I ammend any of the A.1 through A.3 models to fit my problem?

Yes, if you have 2 yes-no tests then you can work out the 4 probabilities and use a multinomial likelihood. Or if you have 3 yes-no tests you can do a multinomial with 8 categories., etc.

In particular, see Section 3.2 of this paper.

1 Like

I think that is two tests two populations. I have two tests, one population.

But now that I’m reading it, maybe I can make it work

Using the notation from Bob and Andrew’s paper, let \gamma_i and \delta_i be the specificity and sensitivity of test i, respectively. Then

p_i = (1-\gamma_i)(1-\pi) + \delta_i\pi,

is straightforward to implement in Stan. This assumes the results of the tests are conditionally independent given the parameters.

1 Like

That’s not my model! And trying to fit it in BUGS/JAGS is why I gave up on discrete sampling and embraced marginalization. What used to take 24 hours in BUGS takes about 10m in Stan (and that was years ago and Stan’s improved a lot since then).

Right. That won’t work. You need something with parameters for each of the tests and associated likelihoods for those observations (plus priors).

You can, but that’s inefficient compared to dealing with them inefficiently because the multinomial requires you to compute all those probabilities.

It’s conditionally independent given the true disease status, which is either “missing data” or a “parameter” depending on your philosophical bent.

But it doesn’t have to be and often independence is a bad modeling assumption. Instead, similar tests will produce similar results. For instance, two PCR tests might have correlated results in the same way that an X-ray and MRI produce correlated results.

Assuming independence is also wrong in that some cases are more difficult. If someone is heavily infected, tests are all more likely to return positives. This goes beyond 0/1 state because the 0/1 state is based on some arbitrary threshold. You see this in cancer diagnosis—the bigger the tumor, the easier it is to get a positive result on a test (image or biopsy or physical exam).

I have a partly finished case study where I add a difficulty parameter to the Dawid and Skene model. But this is well known in the epidemiology literature if not in the ML literature. For instance, check out the classic Albert and Dodd paper, which has a nice survey of some of the approaches:

1 Like

You can, but that’s inefficient compared to dealing with them inefficiently because the multinomial requires you to compute all those probabilities.

Generally, yes. With two tests, I think I can get away with it.

But it doesn’t have to be and often independence is a bad modeling assumption. Instead, similar tests will produce similar results. For instance, two PCR tests might have correlated results in the same way that an X-ray and MRI produce correlated results.

Excellent point. This will likely depend on the nature of the test. I’ll have to think more aout it.