Reaching max_num_steps in the ODE solver

ADMIN EDIT: Created this as a new topic to discuss a subquestion that arised in Adjoint ODE Prototype - RFC - Please Test (@martinmodrak )

No. I would see it as a divergent transition thing. It should not happen, but when it happens one should follow up.

One strategy could be to warmup the sampler and then restart with the same step size, mass matrix and initial the different ODE solvers if one is very worried about convergence coming out differently.

I disagree. Whenever you exhaust the max_num_steps, your inference may be severely compromised because you cannot explore part of your parameter space, irrespective of the curvature.

That region of parameter space may be unimportant or it may not be, but the correct metric for that cannot be how stressed the integrator is.

Are you sure tho that it can compromise your inference? Because if it happens during sampling that max num steps is reached, sampling stops and you dont get any draws from that chain (at least this was the case with most recent cmdstan). This should be a warning sign to not make conclusion of the results of other chains, or possibly rhat catches this problem.

If it occurs during warmup tho, sampler doesnt stop seem to be stopping. I would really like to understand what are the rules for recovering from domain and runtime errors that stan_math gives, in order to be able to answer the question whether too low max_num_steps can lead to biased inferences that diagnostics dont catch.

With compromised inference I referred to a case where results that are wrong and diagnostics dont catch it. If they catch it then you can always sample again with better settings so you dont make bad inferences.

I’m actually not sure whether this is what happens, or how this is handled in general. Still, the solver not finishing if we are in the typical set (and then what happens?) can not be great.

It’s totally fine for MCMC samplers to reject draws. It’s still not good to do it, of course.

No, that’s not true to my knowledge. The attempted draw gets rejected, the sampler will “recycle” the last draw and then the sampler will try again. That’s in principle safe to do for MCMC samplers. It’s just not a good thing to happen.

… and in all honesty: When I model an ODE problem which takes long to compute and there are occasional hiccups like these, then I would try to find the root cause, but I would certainly use the model fit. I would not use that for a final decision making for the purpose the model was build for, ok… but saying that such a fit cannot be used for anything would be unreasonable in my eyes.

Turns out I’m actually guilty of this as well… 1e5 iterations were not enough…

I think it’s possible to move this discussion into a new topic? I saw @martinmodrak do it here, not sure whether I can do this, or only @wds15?

I believe hitting the max_num_steps limit behaves exactly like calling reject? Is that so?

It’s perfectly fine if ODE solver throws max step error during warmup: sampler is exploring and we allow hiccups. On the other hand if the solver does this during sampling the final draws we get is biased because we fail to see outcomes from certain part of the parameter space.

In the lower level, my understanding is that the max step error (eventually) leads exception to sampler and is interpreted as divergence there, which leads to current level of trajectory expansion being abandoned and a sample is drawn from previous trajectory expansion. In this context max step error behaves exactly like reject statement.

Not really.

Under certain conditions rejecting a numerically imprecise state can maintain the target distribution, and the asymptotic consistency of Markov chain Monte Carlo estimators, but it may easily frustrate the preasymptotic behavior. At the same time under other conditions the rejections can compromise even that asymptotic consistency. The question isn’t so much if there’s a bias but rather how significant is the bias.

The problem is that the numerically unstable region can hide other important regions of the target distribution. Imagine for example two regions of high probability connected by a narrow bridge that causes numerical instabilities – when all attempts to pass through the bridge are rejected due to numerical instability issues then the sampler will only ever see one of those regions and be unable to fully quantify the target distribution. If all of the Markov chains end up on the same side then no diagnostics will ever indicate the missing probability on the other side.

A lot of the intuition that has been discussed here is based on the assumption that the numerically unstable region is near the boundary of relevant behavior, in which case the inability to sample the unstable region only mildly biases the exploration. This is often true, but it’s not universally true.

A few heuristics that are more reliable than others:

  • If numerical problem arise in the warmup phase but never in the main sampling phase then the unstable regions are likely far outside of the typical set and can safely be ignored
  • If multiple chains are run from diffuse initial conditions, are run long enough, see numerical problems in the same neighbored, and that neighborhood is in the tail of the other samples then the unstable region is probably on the edge of the typical set which minimizes a bit it’s potential consequences.
    But these are only heuristics and they offer no guarantees.
4 Likes

Maybe there should be better way for ODE models to reject parameters.

Take that orbital mechanics example. I’m not sure, but I think max_num_steps usually gets exhausted whenever two planets are on a collision course, or otherwise orbiting each other very closely.

Aside from a different parametrization which might exclude such cases, one solution which would not rely on hitting max_num_steps would be to check current trajectories/states for such a thing. We know that the planets can not actually collide, as we would then not have our future observations, i.e. the probability of this happening is literally zero.

However, currently we cannot check these things (easily). In principle we could check this in the specification of the ODE vector field (dydt), but the values that get plugged in there are not necessarily the final values, as the solvers may themselves “reject” some trajectories due to precision concerns.

What would help would be to be able to pass another UDF which only gets applied to “actual” trajectories (accepted/provided by the solver) and which weeds out problematic parameters/trajectories.

1 Like

One has to be careful with concepts like “reject” – in particular we have to differentiate between what is model and what is computation.

Right now many dynamical systems specified via ODEs contain extreme behavior that manifests as certain parameter values. This stresses the ODE solvers when they’re not configured appropriately, leading to large numerical errors in the final states and their sensitivities which then induces problems with the computation. In other words the computational problems are a manifestation of extreme behavior in the model.

The computational solution is to use, well, better computation. By putting more work into each ODE solve we can better resolve the extreme dynamics and accurately quantify their contributions to the posterior.

The modeling solution is to remove the extreme behavior from the model. The most direct approach is to use more informative prior models that suppress the extreme parameter configurations; this has critical benefit of yielding a smooth target distribution that facilities accurate computation of the resulting model.

Jury-rigging the “reject” function based on observed numerical solver behavior, on the other hand, only implicitly changes the model. Because the rejection is based on solver output it can account only for extreme behavior that manifests in particular numerical behavior; for example if some collision events integrated fine but some didn’t then the reject approach wouldn’t be removing all collision events just some particular collision events. This selection can have unintended effects on the final inferences. On top of that the rejection introduces a discontinuity – the target density is non-zero on one side of the reject condition and zero on the other – which complicates the computation.

Ultimately removing extreme behavior can be extremely useful, but one has to do it in a principled way lest the final inferences be compromised. Almost always this involves careful prior modeling motivated by numerical pathologies instead of trying to define implicit models based on those numerical pathologies.

Yes, hence my proposal to provide users the opportunity to reject samples based on the behavior of the model, not implicitly based on the behavior of the solver.

If applicable/possible/reasonable, doing this via “better” priors may be the best way, but sometimes it may be necessary/desirable to not outright exclude parameter values of which we do not know a priori that they may lead to problems down the road.

Rejecting samples for any reason implicitly excludes certain parameter values. Because the evaluation of the model for fixed parameter values is deterministic if any of behavior induces a rejection it will always induce a rejection for the same parameter configuration.

Not sure what we are disagreeing on. Do we agree on the following, which is my only point?

Ranked by “badness of reason for rejecting a proposal” we get:

Divergence < Domain knowledge says state of ODE is ridiculous (e.g. collision event) <<<< Impatience (solver took too long)

Example: Say you are an astronomer in the 19th century, and something’s weird about the orbit of the most recently discovered planet. You suspect there is another planet somewhere, but all you know is that is in a stable orbit around the sun. What do you do with proposals that would imply a collision with one of the other planets?