I see - there are two things:
- You’ve put a lower bound of 0.5 on your values, so your
prior
is not really simulating from the prior you use in the model. Using rejection sampling to actually sample from beta with a lower bound:
x <- rbeta(1e5,100,10); x <- x[x > 0.5]; quantile(x, probs = c(0.025, 0.25, 0.50, 0.75, 0.975))
# 2.5% 25% 50% 75% 97.5%
# 0.8492021 0.8920933 0.9113891 0.9284150 0.9549340
Which gets me very close to what I get when running the model with your data:
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
sensspec[1] 0.91 0.00 0.03 0.84 0.89 0.91 0.93 0.96 1705 1.00
sensspec[2] 0.94 0.00 0.01 0.91 0.93 0.94 0.95 0.96 924 1.01
pie 0.03 0.00 0.02 0.01 0.02 0.03 0.04 0.06 912 1.01
- you have a data-prior conflict. Let’s simulate the proportion of positives you would expect given your priors:
N <- 1e5
sensspec <- matrix(rbeta(2 * N, 100, 10), nrow = N, ncol = 2)
pie <- rbeta(N, 2 , 8)
theta <- pie*sensspec[1,] + (1-pie)*(1-sensspec[2,])
quantile(theta, probs = c(0.025, 0.25, 0.50, 0.75, 0.975))
# 2.5% 25% 50% 75% 97.5%
# 0.1070964 0.1823829 0.2434531 0.3207248 0.4977559
So the observed positive frequence of roughly 0.09
is a priori quite unlikely. Something has to give. So the model does the “rational” thing: it modifies it’s belief about all the parameters. I.e., if you believe a disease is reasonably common (around 1/5
of population), you test with sens and spec around 0.9 (which should give you a ton of false positives) and you get unexpectedly few positives, you need to start to doubt both that disease is common but also that your test is working as expected (or more generally, that your model is correct).
In that sense, the model behaves as expected. The correct reaction is not necessarily to update your beliefs about sensitivity and specificity, but to doubt that those beliefs, your prior on prevalence or other aspects of data collection are correctly expressed in the model.
Does that make sense?
Finally, the model as you wrote is not very idiomatic Stan and it took me a while to understand. For inspiration, here’s how I would write the same model, taking advantage of Stan built-in functions and a slightly simpler form of input data (but there are many ways to write this, I don’t claim this is best or definitive):
data{
int N_total;
int N_positive;
}
parameters {
vector<lower=.5, upper=1>[2] sensspec;
real<lower=0, upper=1> pie;
}
transformed parameters {
real theta = pie*sensspec[1] + (1-pie)*(1-sensspec[2]);
}
model {
pie ~ beta(2,8);
sensspec ~ beta(100, 10); //This gives all elements of sensspec the same prior, no need for a loop
N_positive ~ binomial_lpmf(N_total, theta);
}