Posterior predictive one dataset per sample or block of samples?

lwiklendt · December 14, 2018, 1:16am

Let’s say we have a given dataset with 20 observations and we’ve fit a mixed-effects model, producing 2000 posterior samples. We want to generate 100 datasets of the same size as the given dataset for comparison.

Do we need to

take 100 of the 2000 samples (discarding the remaining 1900), and generate a single 20-obs dataset per sample, including drawing multiple random-effect levels from that single posterior sample, or
can we draw a single observation per sample for each of the 2000 samples, drawing only a single level per sample, and aggregate a 20-obs dataset by combining each block of 20 separate samples, resulting in 100 datasets?

ScottAlder · December 14, 2018, 2:00am

If you have access, you might want to take a look at chapter 12 of @richard_mcelreath 's book, Statistical Rethinking. The last section of the chapter covers posterior predictions for multilevel / mixed effects models. If you don’t have access, you can also check out this lecture on the author’s YouTube which covers chapter 12. Looks like posterior predictions starts around 50 minutes

Edit: looks like the book sample includes chapter 12

lwiklendt · December 14, 2018, 5:07am

I think my question was too ambiguous. I want to reduce the given dataset down to a single number (summary statistic) by applying some function to the responses, such as mean, max, min, median, etc. I also want to apply the same function to the responses of each of the 100 datasets drawn from the posterior. I want to do this by sampling new random-effects levels (or in McElreath’s words “clusters”). Then I want to compare the summary statistic obtained from the given dataset to the summary statistics obtained from the 100 datasets drawn from the posterior.

Neither McElreath’s lecture nor Chapter 12 of his book addresses the question I have. I’m not asking about the difference between sampling the average cluster or sampling a new cluster (i.e. not just the difference between sections 12.4.1 and 12.4.2). My question relates to sampling a new cluster, and the difference between two ways of doing that. That is, can a dataset drawn using multiple posterior samples replace a dataset drawn using only a single posterior sample, if we draw 100 such datasets to compare against a given dataset via their summary statistics?

Topic		Replies	Views
Still confused about how sample_new_levels works in posterior_predict Modeling posterior-predictive	1	607	May 13, 2022
How to combine posterior draws/samples from separate datasets/models? brms brms	4	1092	April 6, 2022
Stan samples from prior predictive distribution the same number of samples as posterior draws. How restrict it to only one? Modeling	3	540	November 21, 2022
Blog post: The right way to do predictive checks with observation-level random effects Publicity brms	6	325	December 1, 2024
Simulating multiple datasets from posterior predictive distribution Modeling	3	385	January 26, 2021

Posterior predictive one dataset per sample or block of samples?

Related topics