I understand that, due to the way STAN handles parameter constraints, parameters lower bounded at 0 can never actually take 0 values. What are recommended strategies for dealing with parameters that, conceptually, should be bounded, but may in some cases have values extremely close to the boundary?
My personal context with this problem:
I’ve been using STAN to fit some pathogen decay models to data which take the form Count (Y) vs. Time (t). In a typical setting, we might be interested in relating an exogenous variable, such as temperature (T), to a first order rate of pathogen decay, resulting in a model, over all experiments j in {1,…,J}, with observations n in {1…N_{j}}, e.g.:
log Y_{n} ~ normal ( log Y_{0j}  K_{j} t_{n} , σ_{Yj})
log K_{j} ~ normal ( log K_{ref} + λ ( T_{j}  T_{ref} ), σ_{K})
A common complication to the above model occurs in datasets where decay begins after some lag period, during which the population is relatively stable. These periods can be controlled for using a slight modification of the above model with the added parameter t_{Dj}, the lag time between the start of experiment j and the onset of decay:
if t_{n} ≤ t_{Dj} : log Y_{n} ~ normal ( log Y_{0j} , σ_{Yj})
if t_{n} > t_{Dj} : log Y_{n} ~ normal ( log Y_{0j}  K_{j} t_{n} , σ_{Yj})
log K_{j} ~ normal ( log K_{ref} + λ ( T_{j}  T_{ref} ), σ_{K})
Across experimental datasets, some experiments will exhibit a discernable lag phase while others will not. Other experiments might be noisy or have imprecise measurements of Y_{n} such that there is little evidence in the likelihood to specify whether or not a lag phase is present.
I’ve tried a few different approaches to dealing with the lag phases:

Using a mixture distribution over both models (essentially model 1 is t_{Dj} = 0 and model 2 is t_{Dj} > 0) for each experiment
 Works well in terms of main parameter recovery, though experiments that are weakly identified as one model or the other often have low effective sample sizes due to slight bimodality for their parameters
 Relatively slow, and I think it might be contributing to difficulty sampling from more complex models where additional mixture components are present 
Using the more complex model for all experiments
 Slightly biases decay rate estimates in cases where some experiments actually do not have a lag phase
 Hits maximum treedepth, which I guess is due to very large steps needed for a uturn in unconstrained parameters where the optimal value for the constrained parameter is near the boundary t_{Dj} = 0 
Allowing t_{Dj} < 0
 Conceptually problematic, since there is either an unidentifiable pairing of t_{Dj} and Y_{0j} (and related treedepth problems when constraining priors are added) or an essentially incorrect model if we still treat the intercept as the first observed count.
 The incorrect model, where the intercept is treated as the first observed count, is easy to sample but gives biased estimates of K_{j} as the apparent decay rates flatten with negative values of t_{Dj} (though temperature sensitivity λ is fairly well estimated)
Is there any approach that would be superior to the mixture model in ease of sampling while remaining minimally biased for cases when t_{Dj} = 0?