I was able to use Stan to fit an ODE-based model of gene regulation. Now I wonder what is a good way of deciding whether the model is a good fit and hence whether the proposed regulation is supported by the data. The motivating example is the contrast between those two fits. Lines are draws from the posterior predictive distribution, big dots are actual measurements:
Visually, the first example (codY) fits the data well, while the second example (pyrB) fails to capture the low expression around the 10 minute time mark. This indicates that the proposed regulation is plausible for codY, but not plausible for pyrB. Is there a way to formalize this line of reasoning? Everything I could find about model comparison seems not well suited for my case. Am I missing something basic and well understood? Or am I firmly in the territory of heuristics?
To be precise, I usually have multiple possible regulation variants for each target gene, increasing in complexity (number of parameters) and I want to be able to stop with the least complex model that “fits the data well”. When I search the literature/blogs/this forum for “model selection” the content focuses on prediction and cross validation, usually suggesting some kind of hyper-model representing a weighted average of all possible models. I am however not that much interested in prediction. In fact, for some of the data I work with, prediction improves even after the model ceases to be biologically interpretable/relevant, so optimizing prediction seems to be a wrong direction. What I want is to get some interpretable number that would reflect my confidence that the model is a plausible explanation for the data, so that a biologist can make some further decisions based on this information.
Note that in general, I cannot claim that my set of candidate models contain a model that is correct (not even approximately correct), which I think is an impediment to computing probability of the individual models in an encompassing hypermodel. I need to able to recognize (automatically) that none of the candidate models fit the data well.
Another fun problem is that when the data can be fit by one of the simple models, the more complex models may become non-identifiable and Stan diverges on them. So in general I cannot even fit all of the models (or - I think - a combined hyper model).
Some directions I’ve been thinking about:
- Comparing the distribution of quantiles of measurements in the model data (e.g.,
ecdf(samples)(measured_value)) to uniform distribution (with NHST or KL-divergence)
- Calculate some statistic of distance between the measurements and the samples
- Use Kolmogorov-Smirnov test for the measured data either a) separately for each replicate or b) the whole bag of posterior samples
Thanks for any ideas!