I am currently facing the problem of strongly increased runtimes when using a parameter in an if-condition.
Consider the following sample code
if (cond)
y ~ normal(x, sigma);
else
y ~ normal(z, sigma);
If I define the condition to be a normal comparison, i.e.
if (some_variable < 50)
y ~ normal(x, sigma);
else
y ~ normal(z, sigma);
this runs rather fast in my model (~43 s per chain).
However what I need is to introduce a parameter switch which is a uniformly distributed value, such that
if (some_variable < switch)
y ~ normal(x, sigma);
else
y ~ normal(z, sigma);
This leads to the runtime exploding (~1400s per chain).
I understand that using parameters in if-conditions can lead to piecewise posteriors (see: this github issue ), however in both examples y would be a piecewise posterior so I don’t understand how this massive increase in runtime can occur (due to the sampling algorithm).
I already assume that this way of implementing is probably not recommended, however I am struggling to find an alternative way of implementing the needed behavior of the model.
It’s not clear from the question which of some_variable, y, x, z, and sigma are parameters versus data. But here’s an example of what can go wrong here, and why we might expect the runtime to blow up when switch is a parameter.
For illustrative purposes, suppose that all of some_variable, y, x, z, and sigma are data. Then the problem would be that the likelihood is abruptly discontinuous in switch at switch = some_variable.
The reason the runtime blows up for you is very likely because adaptation forces the model to take tiny step-sizes, which in turn require long treedepths. That is, the posterior geometry becomes super nasty, and even though the gradient calculation is almost as fast as before, the number of gradient evaluations per iteration blows up as a result of adaptation’s attempts to cope with the nastiness of the posterior.
Sorry. I should have specified it more clearly. You were right in the assumption though that some_variable, y,x,z and sigma are data in this example while switch is the parameter.
Is there a way to bypass that issue in a more effective way compared to my implementation?
Sampling from discontinuous target distributions is hard, and the solution is usually to reparameterize the model such that the target distribution is continuous.
An important question is what else you use switch for in this model. If you don’t use it for anything beyond what you’ve written down, then you can marginalize over whether or not switch is greater than some_variable. That is, instead of parameterizing in terms of switch, parameterize in terms of p, the probability that switch is greater than some_variable. You can estimate p from data in conjunction with some suitable prior that captures the same information about whether switch is greater than some_variable as did your original prior on switch. And then write the likelihood as
That is a very interesting alternative approach. I will try using this as switch is indeed only used in the described context nd nowhere else. Thank you very much!