Post. Predictive Interval for a single observation with only group membership as a predictor

Suppose you are trying to get posterior predictive intervals for a linear model with a single 2-level factor predictor (essentially a t-test scenario). The posterior predictive intervals for each subject will be pretty much the same since there’s no information in the model besides group membership. That being said, is there a simple way to just get posterior predictive intervals for each factor level?

Or am I mistaken to think this is reasonable? I’m under the impression the posterior distribution P(theta|data) strictly concerns with the parameter value itself and does not capture the expected value for a particular datum given a parameter estimate, ie P(datum_i | theta).

Essentially, the inference I desire here is to obtain the expected value and 95% prediction interval for a single observation from a group. Ie, if I were working with just frequentist intervals I think the correct formula would be mu +/- 1.96SD (as opposed to mu +/- 1.96se for the confidence interval).

I’m aware I can just average the intervals manually for this end, just wondering if I’m overlooking something obvious in the bayesplot or brms packages since I can’t seem to find a way to do this automatically.

1 Like

Yes, you can do posterior_predict(fit, data = original_data[1, ]) or something to get the posterior prediction for a single actual or hypothetical observation.

Hmm, when I try this (data = original_data[rows_of_interest, ]), the resulting yrep still includes the full sample.

It includes one prediction for each row of interest and posterior draw.

Sorry for the late reply, thanks for the help. The suggested solution does work for me, although the posterior predictive inferences can bounce around quite a bit.

Essentially what I was trying to do is get the predictive interval for a single factor level itself, ie,

outcome ~ intercept + gender

where the intercept might represent male and the gender coefficient represents the females differences from males. Since there are no other predictors, I expected a single uniquely defined predictive interval to capture the wide range of expected outcome values. But on second thought, this variability in posterior predictive interval might be more illustrative to the point I’m trying to make.

Yeah, just generate the predictive interval for a man and then another one for a woman.

Ok, I am probably misunderstanding something here then – sorry, I am quite inexperienced with everything being discussed in these forums.

So I can index the final yrep and this way get to specific parts of the posterior I am interested in, e.g. all members of category 1 and all members of category 2 separately –
but whether I add the comment “data = data[rows_of_interest, ]” to a call of posterior_predict does not influence the dimensions of the yrep matrix that is returned for any rows_of_interest. This however is how I understood your initial post, i.e. that it would only return a matrix of dimension nSamples x n_rows_of_interest.

My bad. The argument to specify is called newdata rather than data, so it would be posterior_predict(fit, newdata = original_data[rows_of_interest, ]).

1 Like

Thanks for that – this really helps and was what I was looking for :)
But probably more my bad, would have been quite obvious from the documentation, where it is the first argument described. Sorry!