Gibbs Sampling in Stan



I’m currently working on a project which will utilize methods found in the appendix of this paper. The paper outlines how Gibbs sampling can be used to estimate the prevalence (\pi) of a disease, the sensitivity (S) , and the specificity (C) from the results of a diagnostic test for which there is no gold standard to compare against. Details can be found in the “One Diagnostic Test” section of the paper.

The method proceeds as follows:

  1. Choose priors of the prevalence, sensitivity, and specificity. This paper uses beta priors for all three. Sample an initial point from each prior.

  2. Using the samples from step 1, draw “pseudo-data” from the conditional distribution.

Let a and b be the observed number of positive and negative test results. Let Y1 be the number of true positives and Y2 be the number of false negatives. If we knew the true prevalence, sensitivity, and specificity, then (from the appendix)…

Y1 \vert a , \pi, S, C \sim \operatorname{Binom} \left( a, \dfrac{\pi S}{\pi S + (1-\pi)(1-C)} \right)

Y2 \vert a , \pi, S, C \sim \operatorname{Binom} \left( b, \dfrac{\pi (1-S)}{\pi (1-S) + (1-\pi)C} \right)

We don’t have the true prevalence, sensitivity, and specificicty, so we use the samples drawn from the prior in step 1 to obtain inital estimates of Y1 and Y2.

If we knew the true Y1 and Y2 then, then the conditional distribution of the prevalence, sensitivity, and specificity would be

\pi \vert a,b,Y1,Y2, \alpha_{\pi} \beta_{\pi} \sim \operatorname{Beta}(Y1+Y2 + \alpha_{\pi}, a+b-Y1-Y2+\beta_{\pi})

S \vert Y1,Y2, \alpha_{S} \beta_{S} \sim \operatorname{Beta}(Y1+S, Y2+\beta_S)

C \vert a,b,Y1,Y2, \alpha_{C} \beta_{C} \sim \operatorname{Beta}(b-Y2+ \alpha_C, a - Y1 + \beta_C)

Here, \alpha_x, \beta_x are the parameters for the prior for metric x (i.e. prevalence, sensitivity, specificity). We don’t know the true Y1 and Y2, so we use the samples drawn from above.

  1. Repeat step 2, drawing from each conditional distribution and then using those draws to generate further daws.

It is quite easy to code this up in python or R, but I would like to use Stan so that I can utilize some of the diagnostics and parallelization in the rstan package.


This problem doesn’t require the use of HMC. Can I still use Stan for this problem? My inital thoughts are “No”, due to this thread, but I thought I would double check. I know I can use the algorithm = 'fixed param' argument in rstan::stan to draw from a generated quantities block. However, I’m not sure how to make stan perform the sort of sampling I’ve described.

Your time and insight is appreciated.



I don’t think Stan is the right tool for this. Perhaps something like Nimble or JAGS would be more appropriate.

You can still use a lot of the rstan diagnostics (like rstan::monitor), you just need to wrap the output from the other sampler into the correct format (a three dimensonal array specified in the help ?rstan::monitor)

1 Like


OK, thanks. Any information as to why Stan is the wrong tool for this?



You already have the analytic form of the posterior, and you can implement this directly in, for example, base R.

You just need to iteratively sample each conditional posterior until convergence.



To implement an MCMC scheme you need access to the previous state. The Stan language does not provide this (because that doesn’t make sense if your goal is to implement a log-density).



Sorry to uncheck @Daniel_Simpson’s answer, but there’s an example of how to code exactly this model in the latent discrete parameters chapter of the users guide. You can find this example and others in my latest paper, all coded in Stan.

Gibbs is actually a very bad way to fit these models—it’s super slow to converge. These models used to take 24 hours to fit in WinBUGS with very poor mixing and they now fit in like 30 minutes in Stan. Just be careful to use reasonable inits because there’s a non-identifiability. Duco Veen’s visiting us at Columbia from Utrecht and working on a case study that should be out soon.

P.S. Coincidentally, it was exactly this form of model that led me back from industry to academia. Andrew Gelman and Jennifer Hill helped me formulate the model (turns out Dawid and Skene beat me to the punch by a few decades), but I didn’t understand enough about Bayesian modeling and MCMC inference to diagnose what was going wrong with fitting it. I could fit in EM no problem (and have code on the web in both R and Python for doing that—it’s how Dawid and Skene originally fit these models).