Selecting priors for a binary categorical predictor in test of proportions in brms

Hello!
I am planning to compare two proportions (Test of proportions) using brms but I am having trouble selecting priors for my regression parameters. The categorical predictor is “type”, where it refers to two types of boxes: Protein and Mixed (Based on the type of food it has). Under Protein, successful events= 41 out of 88 trials, and under Mixed, successful events= 41 out of 90 trials.

I had carried out a frequentist “prop.test” earlier and found the estimate values under prop1 to be 0.46 and prop2 to be 0.45. Other values are: chi-squared = 7.83e-31, df = 1, p-value = 1 (I see a separation problem here.). My bayesian model looks like this:

indieat9 <- data.frame(
                                 type = c("protein" , "mixed"),
                     yes = c(41, 41), total_trials= c(88, 90)
)


prior9 <- set_prior("normal(0,1)", class = "b")

pvsm9 <- brm(yes | trials(total_trials) ~ 0+type, indieat9, family = binomial(link="logit"),
           save_all_pars = "TRUE", warmup = 10, iter = 5000, chains = 5, prior = prior9,
           sample_prior = TRUE, inits = "random", cores = 4,
           seed=123)

Apart from the normal(0,1) prior used above, I also used cauchy(0,2.5) as suggested by Gelman for logistic regression parameters and beta (1,1). I am trying to use weakly informative priors for the categorical predictor, “type”. But whenever I run “pp_check (pvsm9, nsamples = 100)” for posterior predictive check, the graph I get is bad (Shown in the attached picture). It all haywire and all over the map, so to speak.

My limited know-how tells me, that’s probably because of the priors (I could be wrong). So my questions are:

  1. Am I correct in diagnosing my problem as my fault in selecting priors?
  2. If yes, what sort of priors can be suggested? (I am pretty sure, my estimates for the parameters won’t exceed 3, which is why I had thought of normal (0,1) earlier).

Context for my experiment: I am looking if free-ranging dogs in a group eat more from the protein box vs from the mixed box. This is the first time such an experiment is being carried out in these dogs (Whose behaviour has been shown to be different from pet dogs). I had done an experiment with the same set up with these dogs but individually. So the difference between the earlier experiment and the current one is individual dogs vs dogs in groups and I want to see if dogs being in groups affect their eating choices in any matter. So I could use data from there to form my priors.

Context about me: I am fairly new to Bayesian statistics (I have been studying it for only a month and a half now), so my knowledge is pretty limited. I am a biology person with minimal math background but I am open to learning but math heavy jargon do tend to go over my head. This is my first question here, so apologies if I posted something wrong or incomplete here. Do let me know, in that case.

Thanks.

1 Like

Hi,

I don’t think you need brms for answering this question. This seems to me to be a simple proportion estimation exercise. This is the first example we teach students in a Bayes class, so there is quite a bit of material lying around. See, for instance, this short post.
The gist is that if you’re interested in a proportion \theta and have prior distribution \pi(\theta) which is a Beta distribution with parameters \alpha and \beta, after observing y successes in n trials your posterior distribution for \theta, p(\theta \mid y, n) is also a Beta distribution, with parameters \alpha^\prime = \alpha + y and \beta^\prime = n-y + \beta.

As the code below shows, the credibility intervals for the probability of success in each group (Protein or Mixed) overlap quite a lot.

For simplicity, I’ve taken uniform priors for the proportions in both groups, but if you have domain expertise to construct an informative prior, please do that!

## Comparing proportions using the Beta-binomial model

# Hyperparameters
alpha_1 <- 1
beta_1 <- 1

alpha_2 <- 1
beta_2 <- 1

# data
y1 <- 41
n1 <- 88

y2 <- 41
n2 <- 90

posterior_1 <- function(x) dbeta(x, shape1 = y1 + alpha_1, shape2 =  n1-y1 + beta_1)
posterior_2 <- function(x) dbeta(x, shape1 = y2 + alpha_2, shape2 =  n2-y2 + beta_2)

# Credibility intervals

level <- 0.95

qbeta(p = c((1 - level)/2, (1 + level)/2),  shape1 = y1 + alpha_1, shape2 =  n1-y1 + beta_1)
qbeta(p = c((1 - level)/2, (1 + level)/2),  shape1 = y2 + alpha_2, shape2 =  n2-y2 + beta_2)

# Plotting 

curve(posterior_1, lwd = 3, xlab = expression(theta))
curve(posterior_2, lwd = 3, col = 2, xlab = expression(theta), add = TRUE)
legend(x = "topright", legend = c("Protein", "Mixed"), lwd = 2, col = 1:2, bty = 'n')

1 Like

Thank you @maxbiostat. I have been searching for a simple Bayesian test of proportions for sometime now but I guess I wasn’t using the right tools or keywords to look for it. Just so that i understand, I am iterating what you have done: You have assigned values for priors and data. Then you assigned a blank function(?) to a variable which will generate beta density values using the priors. Then you used the quantile function for finding out credible intervals. Then you used curve() to plot the density values, whose default limit is0,1 using the function variable generated earlier. Am I correct in understanding the steps?
I am still not sure what went wrong in my brms model to give me such a bad posterior predictive graph.
I do have data from a previous experiment which I plan to use to construct my priors.

To compare the proportions, I was thinking of using the Bayes factor. I found this method to calculate it.

Would you agree with the answer provided or is there a more efficient way to do it?

Extra queries: I have been looking for Bayesian versions of Mann-Whitney, Kruskal Wallis and other non parametric statistics or whatever the Bayesian alternative is. Could you point me to some resources to learn from as a starting point?

Once again, thanks for your help and the code. Much appreciated and apologies if I am asking things I am not supposed to in this forum.

1 Like

I think computing Bayes factors is fine in this case but I do think it requires a bit of technical overhead. Usually, a simple inspection of the credibility intervals is sufficient to inform the analyst about whether the estimated proportions are substantially different. In your case, with the data you presented, I think there’s very little evidence to support the belief that the proportions are different.

So, I used informative priors as suggested (alpha1=20,beta1=26; alpha2=9,beta2=34) and got this as a result:

Since the mean of Proportion(protein) seems greater than Proportion(mixed), I wanted to check if this was actually a true difference or chance. So I did this:

#Probability of Proportion(protein) > Proportion(mixed)

#simulation method

protsim1 <- rbeta(2e6, 61, 73)
mixsim1 <- rbeta(2e6, 50, 83)
greatsim1<-mean(protsim1>mixsim1)
greatsim1  #Result: 0.906542

#Credible interval of the difference in probability of eating proportions between protein and mixed

#Priors
alpha1<-20
beta1<-26

alpha2<-9
beta2<-34

#Data
x1<-41
n1<-88

x2<-41
n2<-90

#simulation
sim <- rbeta(10000, x1+alpha1, n1-x1+beta1) - rbeta(10000, x2+alpha2, n2-x2+beta2)
quantile(sim, c(0.025, 0.975))      #95% credible interval (-0.04024994  0.19562881)

#plotting
plot(density(sim), bty="l", main="Posterior distribution of difference\nbetween the proportions")
abline(v=quantile(sim, c(0.025, 0.975)), lty=2)

So my interpretation would be that there is 90% probability that Proportion(protein) > Proportion(mixed), yet the 95% credible interval for the true difference between these groups spans -0.04 to 0.1956, indicating we cannot have high certainty over a positive difference. Am I correct in my interpretation?

I have a couple of doubts:

  1. Ideally, I would like to compute the Bayes factor, H(0) being Proportion(protein)= Proportion(mixed) and its alternate H(1) being not equal. But I could not come up or find any material/code to do it. Will you be able to help me with this (pointing to the right resource/code; apologies, if this is too much of an ask)? The method I earlier thought would work is giving me a BF >1000 towards H(0). Clearly, something is wrong.

  2. As the overlap is so little, is it right on my part to say that there exists no true difference and thus whatever difference we see is because of chance or is there a different interpretation to it?

Sorry for all the badgering. I am getting overwhelmed by how much I don’t know and unable to do on my own. Once again, thank you @maxbiostat for all your help

See if this helps.

I would just go by the last density plot you show, which has the difference clearly passing through and encompassing zero.

As a final note, you better be able to justify the construction of those informative priors, as they clearly have influence over the final inferences.

1 Like

Thanks for the suggestion, @maxbiostat. I did in fact stumble onto this in the weekend. For the one sided test, I am getting a BF of 9.67 whereas for the two sided, I am getting a BF of 10.37. This isn’t correct, right? I mean, if Proportion(Protein)= Proportion(mixed) has a strong evidence then Proportion(Protein)> Proportion(mixed) should be weak evidence or evidence towards the alternate hypothesis. Am I wrong in understanding this?

I constructed it from my previous experiment, which had the same setup. I took a uniform prior for that (1,1) and my no. of successes (protein), x1 = 19, no. of failures, y1 = 25. For mixed, x1 = 8, y1 = 33. So my posterior shape parameters became, Protein(19+1,25+1); Mixed(8+1,33+1). I have used that posterior as my current prior. I hope I am correct in doing that.