Any insight is useful. I always use shinystan
for diagnostics; it’s a great tool. Thanks also for the link to Michael Betancourt’s document.
Having read through the links that @jroon posted (thanks @jroon, they were very useful!), this seems to be more complex than I had naively thought.
To give some background: The idea is to use a hierarchical model to model dose response data. Data are based on high-content phenotypic screens and measure the number of cells that have survived following treatment with different drugs at different concentrations. This is a breadth-over-depth experiment, so we expect few dose measurements for a lot of drugs. Using partial pooling we should be able to improve the estimates of the model parameters for every drug, by shrinking them towards their overall mean which we learn from the full data.
The main issues seem to center around non-identifiability (or limited identifiability) of these kind of models, even in the presence of data across the full range of the sigmoid curve. This is obvious from the correlation between the model parameters b, c, d, and \hat{e}. Identifiability gets even worse if measurements are located towards the tail ends of the sigmoid function.
Still, I think partial pooling should help here. Even if we have limited data for some drugs, the distribution of the parameters b, c, d, and \hat{e} which we learn from all measurements should provide better estimates for those drugs as well.
Unfortunately, in its current form, the model doesn’t seem to do that: The model works fine for well-behaved & “good” data; as soon as we include data that doesn’t show a clear sigmoid trend, the sampler struggles and gives (lots of) divergent transitions.
Some of the things that were discussed in the links that @jroon posted include
- partially pooling some but not all parameters,
- using different parameterisations of the model.
On top of the issues I fixed earlier thanks to @arya and @FJCC, I’ve tried
- partially pooling d_i \sim N(\mu_d, \sigma_d) but having un-pooled c's,
- using non-centred parametrisation for all model parameters (to deal with the funnels in the hyperparameter correlation plots),
- imposing the constraint c_i < d_i on the parameters (this in combination with b > 0 is to ensure that the sigmoid response decreases with increasing concentrations).
None of these attempts have made the fit more stable. Since I’m no Stan expert, this all feels a bit like stabbing in the dark.