Posterior predictions do not make sense when outcome of a logistic regression is number of trials of another logistic regression

In my data, each observation has three count variables, with the second being a subtotal of the first, and the third one being a sub-subtotal. One thing I’d like to estimate is (total - subtotal) / (total - subsubtotal), but currently posterior_predict() is giving me seemingly absurd values:


N <- 100

set.seed(1)
total <- rpois(N, 10)
subtotal <- rbinom(N, total, 0.5)
subsubtotal <- rbinom(N, subtotal, 0.5)
d <- data.frame(total, subtotal, subsubtotal)

bf1 <- brmsformula(subtotal | trials(total) ~ 1)
bf2 <- brmsformula(subsubtotal | trials(subtotal) ~ 1)

m <- brm(bf1 + bf2, d, binomial)
pp <- posterior_predict(m)
stopifnot(all(pp[, , "subtotal"] >= pp[, , "subsubtotal"]))  # error

I hoped the posterior predictions would take in consideration that subtotal is both the outcome in the first formula and the number of trials in the second formula. However, sometimes the predicted subsubtotal is larger than the predicted subtotal. Did I miss something?

I guess I could work around this issue by expanding the data into total rows per old observation with an ordinal outcome (levels: “not within subtotal”, “within subtotal but not subsubtotal” and “within subtotal”). Is there a better way?

Of course I could create a model with (total - subtotal) / (total - subsubtotal) as the outcome, but I’d like to stick to something like the first model, because when I add predictor variables it provides me odds ratios I’m interested in.

Thank you!

  • Operating System: Windows
  • brms Version: 2.10.0
1 Like

I don’t think that brms is doing that here. bf1 is indeed predicting subtotal but there is nothing in bf2 that tells Stan to take that prediction into account. It might be possible to do what you want to do with a measurement model or a latent subtotal parameter but it’s going to be messy because latent_subtotal would be a discrete parameter. I am reasonable sure that it’s possible to work out the math on how subtotal and subsubtotal should be related to each other if they both follow a binomial distribution. I have no time to work on that now but if you did not get any further response feel free to ping me again and I’ll have a second look.

2 Likes

EDIT: This will not work. I was too quick. This only works in expectation. The expected value of the predicted subtotal will be larger than the expected value of subsubtotal but not each sample. I’ll leave it up because it might help you but the problem is more complicated than I presented below.

So a quick follow up. I think I have the basic math down. I am not sure how to implement this in brms but I am sure you can do it directly in Stan. Let me lay out the math and give some hints on the implementation details. Let’s say that p_1 the binomial probability is for the subtotal and p_2 for the subsubtotal with total the number of trials. Because subtotal is larger or equal to the subsubtotal, than p_1 \geq p_2. This condition is the key dependency that the original brms code did not have.

So the basic model is

\text{subtotal} \sim Binom(p_1, \text{total}) \\ \text{subsubtotal} \sim Binom(p_2, \text{total}) \\ p_1 \geq p_2

The inequality means that p_2 should be defined as 0 \leq p_2 \leq p_1. Now, I assume that the above is a toy example and you probably will want to model the probabilities in some way. Let’s say you assume that you can model the logit transform of the probabilities.

logit(p_1) = \alpha_1 \\ logit(p_2) = \alpha_2 \\ \alpha_1 \geq \alpha_2

Now if you have a hiearchical model with varying probabilities per group it’s going to be complicated to implement this even in stan. I would suggest that you think of the problem as

logit(p_{1i}) = \alpha_{1i} \\ logit(p_{2i}) = \alpha_{1i} - \alpha_{2i} \\ 0 \leq \alpha_{2i}

If you have covariates to model the probabilities, it’s going to get even hairier but it can be done. It’s just a bit more dependent on the variables and the effects you expect them to have which makes it harder to give general advice.

I hope that helps and that I did not make too many typos.

1 Like

Thank you. But how, then, would I get the odds of subsubtotal within subtotal?

First of all. See my edit, this is not going to work if you want to enforce that predicted subtotal \geq predicted subsubtotal. I spoke too soon.

Second, in expectation, the odds would be \frac{p_2 / p_1}{(p_1 - p_2)/p_1} = \frac{p_2}{(p_1 - p_2)}, I believe.

1 Like