Question on betabinomial family

Please share your Stan program and accompanying data if possible.


When including Stan code in your post it really helps if you make it as readable as possible by using Stan code chunks (```stan) with clear spacing and indentation. For example, use

  incidence | vint(size) ~ period + (1|herd), data = cbpp,
  family = beta_binomial2, stanvars = stanvars
)

if we just include intercept, shouldn’t the predicted values for each herd be between 0 and 1? what is the intercept here as the values are above 1. I used the example of the custom family vignette in brms Define Custom Response Distributions with brms

fit2 <- brm(
  incidence | vint(size) ~ 1+ (1|herd), data = cbpp,
  family = beta_binomial2, stanvars = stanvars
)

I appreciate any hint how I can get the values of between 0 and 1 of this outcome from posterior_predict function. Thanks

Howdy! The response variable incidence is the number of disease cases out of the herd size, so the predictions are for incidence, which is not restricted to between 0 and 1. Predictions are made on the scale of the response variable, which is an integer here.

2 Likes

@jd_c thanks! So I is ther any way to get the predicted value of the probability (proportion) from the model?

If you want the proportion of incidence/size then I think you would have to make predictions using the summary=FALSE argument to return the samples, divide them out by the corresponding size, and then get whatever summary statistics that you wanted from the samples for each prediction.

1 Like

Use brms::posterior_linpred.

1 Like

This will return samples on the scale of the linear predictor, which would still need to be transformed. Also, this would not be the same as samples from predict() because only the uncertainty in the linear predictors would be included for posterior_linpred. It depends on what is wanted. Using the model in the brms vignette linked in the OP, the following have similar means but very different quantiles.

new_dat <- cbind.data.frame(size=10, period=4, herd=999)
new_dat$period <- factor(new_dat$period)
summary(predict(fit2, newdata=new_dat, allow_new_levels=T, summary=F)/10)
summary(plogis(posterior_linpred(fit2, newdata=new_dat, allow_new_levels=T)))
summary(fitted(fit2, newdata=new_dat, allow_new_levels=T, summary=F)/10)
1 Like

This would be posterior_linpred(transform = TRUE)

2 Likes

In the OP they specifically wanted to use posterior_predict, so was trying to address that. But good to know about the transform argument in posterior_linpred as I hadn’t noticed that before!

3 Likes

@jd_c @jsocolar
Thank you both for your help. I want individualized prediction prediction and your hints helped a lot. I appreciate it.

2 Likes

@jd_c @jsocolar
I am interested in the intercept only. I ran the model with intercept and with posterior_linpred as below but for the repeated measures for the same herd we get the same value. I repeated this with my data and faced the same issue! I appreciate any thought/clue on this.


 
 fit2 <- brm(
   incidence | trials(size) ~ 1+ (1|herd), data = cbpp,
   family = beta_binomial()
 )
 
 
 
 pred_fit2<-posterior_linpred( fit2, newdata=cbpp, allow_new_levels=T,transform=TRUE)
 
 pred_fit2_table<-as.data.frame(posterior_summary(pred_fit2))

 pred_fit2_table

 Estimate  Est.Error       Q2.5     Q97.5
1  0.1446167 0.05069499 0.07941006 0.2781415
2  0.1446167 0.05069499 0.07941006 0.2781415
3  0.1446167 0.05069499 0.07941006 0.2781415
4  0.1446167 0.05069499 0.07941006 0.2781415
5  0.1185899 0.03494250 0.05772497 0.1980215
6  0.1185899 0.03494250 0.05772497 0.1980215
7  0.1185899 0.03494250 0.05772497 0.1980215
8  0.1266686 0.03645319 0.06857501 0.2141961
9  0.1266686 0.03645319 0.06857501 0.2141961
10 0.1266686 0.03645319 0.06857501 0.2141961
11 0.1266686 0.03645319 0.06857501 0.2141961
12 0.1205430 0.03574436 0.06007615 0.2030567
13 0.1205430 0.03574436 0.06007615 0.2030567
14 0.1205430 0.03574436 0.06007615 0.2030567
15 0.1205430 0.03574436 0.06007615 0.2030567
16 0.1102823 0.03268623 0.04623552 0.1787462
17 0.1102823 0.03268623 0.04623552 0.1787462
18 0.1102823 0.03268623 0.04623552 0.1787462
19 0.1102823 0.03268623 0.04623552 0.1787462

@jd_c @jsocolar
tried another data and the posterior_linpred results for different individuals is as below, I am truly concerned what we obtain on predicted value of beta_binomial. I appreciate any hint.

	Estimate	Est.Error	Q2.5	Q97.5
1	0.160733384	0.003976093	0.153509283	0.169099702
2	0.160733384	0.003976093	0.153509283	0.169099702
3	0.160733384	0.003976093	0.153509283	0.169099702
4	0.160733384	0.003976093	0.153509283	0.169099702
5	0.160733384	0.003976093	0.153509283	0.169099702
6	0.160733384	0.003976093	0.153509283	0.169099702
7	0.160733384	0.003976093	0.153509283	0.169099702
8	0.160733384	0.003976093	0.153509283	0.169099702
9	0.160733384	0.003976093	0.153509283	0.169099702
10	0.160733384	0.003976093	0.153509283	0.169099702

In fit2, wouldn’t we expect repeated measures from the same herd to have the same expectation and the same proportion?

posterior_predict gives predicted outcomes from the beta binomial conditional on the linear predictor. posterior_linpred gives the linear predictor, and thus posterior_linpred(transform = TRUE) gives the expectation of the outcome. There is one more quantity that you might be asking for, which is the binomial proportions after accounting for the beta part of the variation. Is that what you are after here? I don’t think brms has a built-in for recovering those, but they can be computed by numerically integrating the likelihood over that intermediate probability (intermediate in that it sits “between” the beta and the binomial) and normalizing.

2 Likes

Hi @ssalimi Were you thinking that because within the same herd, the different observations have different numbers of size, you were expecting the predicted proportions to vary within each herd for each observation?

1 Like

@jd_c yes, exactly. Different observation have different size and incident and proportions should be different. But, posterior linpred or dividing on soze doesn’t provide the proportion! I appreciate any hints to get real proportion. Thanks

@jsocolar repeated measures have different size and incident so the proportions shouldn’t be the same.

Would you kindly give an example of this calculation? Thanks

We can distinguish three proportions:

  1. The expectation of the fitted beta-binomial distribution divided by its number of trials. This is the model’s prediction of the proportion, and is what is being output by posterior_linpred(transform = TRUE). This is expected to be constant across observations with identical covariates and random effect group membership.
  2. If we break the beta-binomial apart into a beta part and a binomial part, then we can talk about the proportion that sits “between” the beta part and the binomial part–the quantity that is a parameter of the binomial and whose prior is beta.
  3. The observed proportion.

This calculation refers to the proportion #2 mentioned above. We can generate a posterior distribution for this proportion p by sampling from the following probability density function iteration-wise over a and b which are fitted parameters in the model:

\frac{\beta(p | a, b)\times B(y | p, t)}{\int_0^1\beta(p | a, b)\times B(y | p, t)dp}, where a and b are the fitted parameters of the beta distribution, \beta is the beta PDF, y is the observed count, t is the number of trials in the binomial distribution, and B is the binomial PMF.

3 Likes

@jsocolar thanks for your time and explanation. I am still confused why posterior_linpred for intercept with various incident and trials in different time-point provides a fixed number. In a cross–sectional data , it is only one number for every individual. I changed the model to

fit2<-brm(incidence | trials(size) ~  1+(period|herd),
            data = cbpp, family = binomial())

And I could get various intercept for each individual and different values from the

posterior_linpred(transform = TRUE)  

I assume to get the values for each person we need to model variable intercept as a level. I hope this is a right approach!

This model treats the expected proportion of successes as a function of covariates. When the covariates do not change, this expected proportion does not change either. When you include an effect of period (i.e. a new covariate that varies across the repeated time points that otherwise share all their covariates), then the expected proportion will change across periods.

1 Like

@jsocolar I modeled beta_binomia on a cross sectional data and used individual id number as model level and could get the random intercept value with posterior_linpred (transformed=TRUE). With other models I always got intercept values using posterior_predict. I think I needed to model the random intercept. I do not know what transform=TRUE does, though. Thanks for all hints.