Automated posterior predictive checks

martinmodrak · December 7, 2017, 9:02am

I am testing a hypothesis on the distribution of distances between transcription factor binding sites and genes in the human genome. I have a model that fits well for several binding motifs I picked randomly and now I want to test whether it can fit all motifs I am interested in (several hundreds). During development, I checked the fits by hand by visually inspecting posterior predictive check (PPC) plots for density and few functions (median, max, min, IQR). Now I need to automate these checks.

The model for each motif has 5 parameters, while there are 5e3 - 1e5 datapoints for each motif, so it seems that fitting a hierarchical model would be an overkill and I thus fit each motif separately, which has reasonable running time.

So my idea is that for each motif I do a PPC of ~50 quantiles and than look at the distribution of the actual quantiles of the data within the PPC quantiles. Then I check, if there are
a) quantiles where the distribution of data values across motifs is non-uniform
b) motifs where the distribution of data values across quantiles is non-uniform
I might even calculate some p-values on that (whoa)…

Does that sound reasonable or is it a footgun? Has a similar approach been formalized somewhere? I’ve read a paper (can’t find the reference) where they do a similar thing but to test whether the model works when fitting simulated data (I thing they transform the uniformity test to a normality test). Is there a reason this approach might not be valid for testing model fit to actual data ? Thanks for any hints!

martinmodrak · December 11, 2017, 11:27am

So an update: turns out this approach (at least in its basic form) is not suitable for my case. The distribution of the “quantiles of quantiles” is very frequently non-uniform, even for cases, I would consider a good fit. For example, here is a density PPC:

Looks good to me! (and PPCs for quantities of interest such as min, max or sd are also good]).

Now let’s have a look at the relation between quantiles of replicates and quantiles in the real data:

So the quantiles of the actual data are very likely to lie in the middle of the replicated quantiles and the distribution is non-uniform.

One might argue that the quantiles being concentrated at the center is a good thing and in a sense better than uniform, but I have (so far) trouble formalizing this idea. Will update here if I make further progress.

Topic		Replies	Views
Posterior predictive checks for hyperprior distributions of multilevel models General hierarchical-model	0	313	December 24, 2022
What Steps after Posterior Predictive Checks Modeling	1	448	August 16, 2023
Blog: Checking soccer models with posterior predictive checks Publicity blog	0	680	January 7, 2019
Posterior predictive check looks weird - what can I do? Modeling posterior-predictive , brms	16	4551	April 24, 2024
Normality assumption for random intercept/coefficients in hierarchical models Modeling hierarchical-model	7	1087	April 2, 2021

Automated posterior predictive checks

Related topics