Case study: ODE-based models and multimodality

@andrewgelman and I just uploaded a new case study: https://mc-stan.org/users/documentation/case-studies/planetary_motion/planetary_motion.html. Comments and questions are welcomed!

Our initial intent was to construct a simple textbook example for an ODE-based model, but the example (planetary motion using Hamiltonian mechanics) turned out to be a bit more complicated. There was unexpected multimodality, and in the tail of the posterior, the ODE became very difficult to solve. The case study illustrates some advanced use of posterior predictive checks to understand how modes arise, even when there is no identifiability problem. And it’s a good reminder that the behavior of your ODE can vary greatly depending on where you are in the parameter space. There is also a question being raised on what constitutes acceptable initial starting points for MCMC; this gets less attention (at least from what I can tell in the Stan community) but it’s another tuning parameter to worry about.

A summary of this case study can be found in Section 11 of our recent paper on the Bayesian workflow: https://arxiv.org/pdf/2011.01808.pdf.

10 Likes

nice! I was curious to reproduce but am missing something, it’s kind of close but my k is more like .01 … my diff eq vector is:

‘p1’,
‘p2’,
‘-(q1-qstar1) * k / (((q1-qstar1)^2+(q2-qstar2)^2)^(1.5))’,
‘-(q2-qstar2) * k / (((q1-qstar1)^2+(q2-qstar2)^2)^(1.5))’

pars:
image

any guesses what I missed would be welcome, but no big deal :)

image

1 Like

@Charles_Driver I’m not exactly sure which part you’re trying to reproduce and I don’t quite follow your notation. If you share your Stan file I can take a look. The code used for the case study is in this repository: https://github.com/stan-dev/example-models/tree/master/knitr/planetary_motion.

it’s my own hierarchical sde stuff, ctsem, you don’t want the underlying stan code ;) was trying to think about whether an sde formulation would help with the modes and couldn’t work it out (vague intuition says yes but perhaps not enough) so thought I’d just try it. notation is not so different, t0 = time zero, 1 = 1st vector element, 2= 2nd vector element, qstar = s .

Great presentation, I like it. But I feel the example is somehow artificial because the first thing I’d constrain is the number of periods the data covers. In fact, I’d simply use orbiting period as parameter instead of k(in planetary_motion.stan model). To me this is more natural from physics perspective, and makes it straightforward to limit the ODE solution within one period because otherwise the system is not identifiable.

2 Likes

This is an interesting example! Thank you so much for writing up the case study! I tried pathfinder for this example. It seems to work, but please let me know if there is anything strange. Thank you so much!

2 Likes

@yizhang I find writing the problem using Hamilton’s equations of motion to be fairly “natural” from a physics perspective (at least for a textbook example). This also breaks the degeneracy, since orbital period is now a function of the star-planet distance, per Kepler’s laws. I expect the degeneracy you get if you use orbital period as a parameter (or you simply fit an ellipsis) would be different from the soft degeneracy we observed. You can reason about how to encode the belief the planet undergoes less than an orbit in your prior. But even then, you’d either need a truncated prior to eliminate wiggles in the posterior tail or inits consistent with the one period hypothesis.

@Lu.Zhang This is a very nice result!

Following your presentation last week, I did think it would be neat to try the pathfinder on this example. There are two notable benefits. (i) The pathfinder can identify minor modes and (ii) once the path-finder seems to converge, you can simulate a planetary trajectory based on a single point in the parameter space and do a cheap posterior predictive check. Contrast this with waiting 2,000+ seconds for all the Markov chains to run. Alternatively, of course, you could run MCMC for less iterations, but that’s a bit hacky.

p.s.: the cheap checks based on a single point in the high prob mass region would also come in handy for the monster model, where running the chains takes a long time, especially when the model misbehaves.

1 Like

I should apologize for making a hasty comment without substantiating it.

It’s natural as a description of the system, but what I meant was to make inference on planet mass. In that context, it’s natural to make orbital observation first and use that information to calculate mass. This is how solar mass is calculated in practice.

I’m glad you mention Kepler’s laws, that’s what I meant to use period to deduce mass. Using the planetary_motion.stan model, this means a change of that single parameter:

parameters {
  real<lower=0.5, upper=0.8> Tr;     // ratio between observation time(4) and total period
}

transformed parameters {
  real a = 1.0;                 // semimajor axis, eyeballing data, could be param in full model
  real<lower = 4> T = 4.0 / Tr; // total period, hard-coding observation time(4) from data.
  real<lower = 0> k = a * a * a * 4.0 * 9.86960440109 / (T * T); // Kepler's 3rd law
  real y[n, n_coord * 2]
  = pmx_integrate_ode_rk45(ode, y0, t0, t, {k}, {m}, x_i,
                           rel_tol, abs_tol, max_steps);
}

model {
  Tr ~ normal(0.6, 0.1);

  q_obs[, 1] ~ normal(y[, 1], sigma_x);
  q_obs[, 2] ~ normal(y[, 2], sigma_y);
}

As you can see the change I made is indeed based on “truncated prior and one period hypothesis”. But that’s exactly my point of example being artificial because in reality that’s how observation is made(I doubt there’s celestial recording logs planet orbits by skipping period, one observation a time). Moreover, what’s encoded in the new model merely reflects the nature of elliptical orbits which is based on inverse-square law in the ODE model anyway. If we make proper observation, have a proper physics framework, why shouldn’t we encode whatever those data and equations imply in our statistical model?

With the above change I’m getting

> fit$summary(c("k", "Tr"))[, c(1, 2, 4, 8, 9)]
# A tibble: 2 x 5
  variable  mean       sd  rhat ess_bulk
  <chr>    <dbl>    <dbl> <dbl>    <dbl>
1 k        1.00  0.000331  1.00     672.
2 Tr       0.637 0.000105  1.00     673.

and

4 Likes

(ii) once the path-finder seems to converge, you can simulate a planetary trajectory based on a single point in the parameter space and do a cheap posterior predictive check.

p.s.: the cheap checks based on a single point in the high prob mass region would also come in handy for the monster model, where running the chains takes a long time, especially when the model misbehaves.

Interesting idea. Thank you for sharing!

@yizhang This is a very nice take on the problem and I think it would be a valuable addition to the case study.

I went after a “natural” formulation based on Newton’s laws, and I would say the theory of motion, rather than a “natural” formulation from on an observational perspective. I think it’s worth discussing the difference between the two. And also emphasizing that in either case, you need a strong, if natural, assumption (so truncated prior on k, or one that states the planet undergoes less than one orbit with probability 1, as opposed to, say, probability 0.999). I’m still amazed at how unforgiving the wiggles in the tails are.

Now, if I were being cheeky, I could propose the object we observe is orbiting another star (so exoplanet imaging), and we have little a priori knowledge about what the orbiting time scale of the object would be. So a wider prior on k and no truncation possible. But this would be a bit of stretch.

Yeah the approach I took above can by no means generalized, as it’s based on the fact that the orbit is elliptic and the data clearly exhibits the position of aphelion or perihelion. I just feel this piece of information, though presented by data, has not been taken advantage of in the original approach, which on the other hand is much more universal.

May I revive this thread to ask whether it would be feasible to first weed out the uninteresting modes by sampling from an approximate model (higher tolerance for the solver), and using draws Form these samples weighted by the posterior to find initial points for a more accurate model (lower tolerance for the solver)?