# Extreme increse in runtime with parameter as part of if-condition

I am currently facing the problem of strongly increased runtimes when using a parameter in an if-condition.

Consider the following sample code

``````if (cond)
y ~ normal(x, sigma);
else
y ~ normal(z, sigma);
``````

If I define the condition to be a normal comparison, i.e.

``````if (some_variable < 50)
y ~ normal(x, sigma);
else
y ~ normal(z, sigma);
``````

this runs rather fast in my model (~43 s per chain).
However what I need is to introduce a parameter `switch` which is a uniformly distributed value, such that

``````if (some_variable < switch)
y ~ normal(x, sigma);
else
y ~ normal(z, sigma);
``````

This leads to the runtime exploding (~1400s per chain).
I understand that using parameters in if-conditions can lead to piecewise posteriors (see: this github issue ), however in both examples `y` would be a piecewise posterior so I donâ€™t understand how this massive increase in runtime can occur (due to the sampling algorithm).

I already assume that this way of implementing is probably not recommended, however I am struggling to find an alternative way of implementing the needed behavior of the model.

Itâ€™s not clear from the question which of `some_variable`, `y`, `x`, `z`, and `sigma` are parameters versus data. But hereâ€™s an example of what can go wrong here, and why we might expect the runtime to blow up when `switch` is a parameter.

For illustrative purposes, suppose that all of `some_variable`, `y`, `x`, `z`, and `sigma` are data. Then the problem would be that the likelihood is abruptly discontinuous in `switch` at `switch = some_variable`.

The reason the runtime blows up for you is very likely because adaptation forces the model to take tiny step-sizes, which in turn require long treedepths. That is, the posterior geometry becomes super nasty, and even though the gradient calculation is almost as fast as before, the number of gradient evaluations per iteration blows up as a result of adaptationâ€™s attempts to cope with the nastiness of the posterior.

Sorry. I should have specified it more clearly. You were right in the assumption though that `some_variable`, `y`,`x`,`z` and `sigma` are data in this example while `switch` is the parameter.

Is there a way to bypass that issue in a more effective way compared to my implementation?

Sampling from discontinuous target distributions is hard, and the solution is usually to reparameterize the model such that the target distribution is continuous.

An important question is what else you use `switch` for in this model. If you donâ€™t use it for anything beyond what youâ€™ve written down, then you can marginalize over whether or not `switch` is greater than `some_variable`. That is, instead of parameterizing in terms of `switch`, parameterize in terms of `p`, the probability that `switch` is greater than `some_variable`. You can estimate `p` from data in conjunction with some suitable prior that captures the same information about whether `switch` is greater than `some_variable` as did your original prior on `switch`. And then write the likelihood as

``````target +=  log_sum_exp(bernoulli_lpmf(0 | p) + normal_lpdf(y | x, sigma),
bernoulli_lpmf(1 | p) + normal_lpdf(y | z, sigma));
``````
2 Likes

That is a very interesting alternative approach. I will try using this as `switch` is indeed only used in the described context nd nowhere else. Thank you very much!