I understand that, due to the way STAN handles parameter constraints, parameters lower bounded at 0 can never actually take 0 values. What are recommended strategies for dealing with parameters that, conceptually, should be bounded, but may in some cases have values extremely close to the boundary?
My personal context with this problem:
I’ve been using STAN to fit some pathogen decay models to data which take the form Count (Y) vs. Time (t). In a typical setting, we might be interested in relating an exogenous variable, such as temperature (T), to a first order rate of pathogen decay, resulting in a model, over all experiments j in {1,…,J}, with observations n in {1…Nj}, e.g.:
log Yn ~ normal ( log Y0j - Kj tn , σYj)
log Kj ~ normal ( log Kref + λ ( Tj - Tref ), σK)
A common complication to the above model occurs in datasets where decay begins after some lag period, during which the population is relatively stable. These periods can be controlled for using a slight modification of the above model with the added parameter tDj, the lag time between the start of experiment j and the onset of decay:
if tn ≤ tDj : log Yn ~ normal ( log Y0j , σYj)
if tn > tDj : log Yn ~ normal ( log Y0j - Kj tn , σYj)
log Kj ~ normal ( log Kref + λ ( Tj - Tref ), σK)
Across experimental datasets, some experiments will exhibit a discernable lag phase while others will not. Other experiments might be noisy or have imprecise measurements of Yn such that there is little evidence in the likelihood to specify whether or not a lag phase is present.
I’ve tried a few different approaches to dealing with the lag phases:
-
Using a mixture distribution over both models (essentially model 1 is tDj = 0 and model 2 is tDj > 0) for each experiment
- Works well in terms of main parameter recovery, though experiments that are weakly identified as one model or the other often have low effective sample sizes due to slight bimodality for their parameters
- Relatively slow, and I think it might be contributing to difficulty sampling from more complex models where additional mixture components are present -
Using the more complex model for all experiments
- Slightly biases decay rate estimates in cases where some experiments actually do not have a lag phase
- Hits maximum treedepth, which I guess is due to very large steps needed for a u-turn in unconstrained parameters where the optimal value for the constrained parameter is near the boundary tDj = 0 -
Allowing tDj < 0
- Conceptually problematic, since there is either an unidentifiable pairing of tDj and Y0j (and related treedepth problems when constraining priors are added) or an essentially incorrect model if we still treat the intercept as the first observed count.
- The incorrect model, where the intercept is treated as the first observed count, is easy to sample but gives biased estimates of Kj as the apparent decay rates flatten with negative values of tDj (though temperature sensitivity λ is fairly well estimated)
Is there any approach that would be superior to the mixture model in ease of sampling while remaining minimally biased for cases when tDj = 0?