Case study: ODE-based models and multimodality

@andrewgelman and I just uploaded a new case study: https://mc-stan.org/users/documentation/case-studies/planetary_motion/planetary_motion.html. Comments and questions are welcomed!

Our initial intent was to construct a simple textbook example for an ODE-based model, but the example (planetary motion using Hamiltonian mechanics) turned out to be a bit more complicated. There was unexpected multimodality, and in the tail of the posterior, the ODE became very difficult to solve. The case study illustrates some advanced use of posterior predictive checks to understand how modes arise, even when there is no identifiability problem. And itā€™s a good reminder that the behavior of your ODE can vary greatly depending on where you are in the parameter space. There is also a question being raised on what constitutes acceptable initial starting points for MCMC; this gets less attention (at least from what I can tell in the Stan community) but itā€™s another tuning parameter to worry about.

A summary of this case study can be found in Section 11 of our recent paper on the Bayesian workflow: https://arxiv.org/pdf/2011.01808.pdf.

10 Likes

nice! I was curious to reproduce but am missing something, itā€™s kind of close but my k is more like .01 ā€¦ my diff eq vector is:

ā€˜p1ā€™,
ā€˜p2ā€™,
ā€˜-(q1-qstar1) * k / (((q1-qstar1)^2+(q2-qstar2)^2)^(1.5))ā€™,
ā€˜-(q2-qstar2) * k / (((q1-qstar1)^2+(q2-qstar2)^2)^(1.5))ā€™

pars:
image

any guesses what I missed would be welcome, but no big deal :)

image

1 Like

@Charles_Driver Iā€™m not exactly sure which part youā€™re trying to reproduce and I donā€™t quite follow your notation. If you share your Stan file I can take a look. The code used for the case study is in this repository: https://github.com/stan-dev/example-models/tree/master/knitr/planetary_motion.

itā€™s my own hierarchical sde stuff, ctsem, you donā€™t want the underlying stan code ;) was trying to think about whether an sde formulation would help with the modes and couldnā€™t work it out (vague intuition says yes but perhaps not enough) so thought Iā€™d just try it. notation is not so different, t0 = time zero, 1 = 1st vector element, 2= 2nd vector element, qstar = s .

Great presentation, I like it. But I feel the example is somehow artificial because the first thing Iā€™d constrain is the number of periods the data covers. In fact, Iā€™d simply use orbiting period as parameter instead of k(in planetary_motion.stan model). To me this is more natural from physics perspective, and makes it straightforward to limit the ODE solution within one period because otherwise the system is not identifiable.

2 Likes

This is an interesting example! Thank you so much for writing up the case study! I tried pathfinder for this example. It seems to work, but please let me know if there is anything strange. Thank you so much!

2 Likes

@yizhang I find writing the problem using Hamiltonā€™s equations of motion to be fairly ā€œnaturalā€ from a physics perspective (at least for a textbook example). This also breaks the degeneracy, since orbital period is now a function of the star-planet distance, per Keplerā€™s laws. I expect the degeneracy you get if you use orbital period as a parameter (or you simply fit an ellipsis) would be different from the soft degeneracy we observed. You can reason about how to encode the belief the planet undergoes less than an orbit in your prior. But even then, youā€™d either need a truncated prior to eliminate wiggles in the posterior tail or inits consistent with the one period hypothesis.

@Lu.Zhang This is a very nice result!

Following your presentation last week, I did think it would be neat to try the pathfinder on this example. There are two notable benefits. (i) The pathfinder can identify minor modes and (ii) once the path-finder seems to converge, you can simulate a planetary trajectory based on a single point in the parameter space and do a cheap posterior predictive check. Contrast this with waiting 2,000+ seconds for all the Markov chains to run. Alternatively, of course, you could run MCMC for less iterations, but thatā€™s a bit hacky.

p.s.: the cheap checks based on a single point in the high prob mass region would also come in handy for the monster model, where running the chains takes a long time, especially when the model misbehaves.

1 Like

I should apologize for making a hasty comment without substantiating it.

Itā€™s natural as a description of the system, but what I meant was to make inference on planet mass. In that context, itā€™s natural to make orbital observation first and use that information to calculate mass. This is how solar mass is calculated in practice.

Iā€™m glad you mention Keplerā€™s laws, thatā€™s what I meant to use period to deduce mass. Using the planetary_motion.stan model, this means a change of that single parameter:

parameters {
  real<lower=0.5, upper=0.8> Tr;     // ratio between observation time(4) and total period
}

transformed parameters {
  real a = 1.0;                 // semimajor axis, eyeballing data, could be param in full model
  real<lower = 4> T = 4.0 / Tr; // total period, hard-coding observation time(4) from data.
  real<lower = 0> k = a * a * a * 4.0 * 9.86960440109 / (T * T); // Kepler's 3rd law
  real y[n, n_coord * 2]
  = pmx_integrate_ode_rk45(ode, y0, t0, t, {k}, {m}, x_i,
                           rel_tol, abs_tol, max_steps);
}

model {
  Tr ~ normal(0.6, 0.1);

  q_obs[, 1] ~ normal(y[, 1], sigma_x);
  q_obs[, 2] ~ normal(y[, 2], sigma_y);
}

As you can see the change I made is indeed based on ā€œtruncated prior and one period hypothesisā€. But thatā€™s exactly my point of example being artificial because in reality thatā€™s how observation is made(I doubt thereā€™s celestial recording logs planet orbits by skipping period, one observation a time). Moreover, whatā€™s encoded in the new model merely reflects the nature of elliptical orbits which is based on inverse-square law in the ODE model anyway. If we make proper observation, have a proper physics framework, why shouldnā€™t we encode whatever those data and equations imply in our statistical model?

With the above change Iā€™m getting

> fit$summary(c("k", "Tr"))[, c(1, 2, 4, 8, 9)]
# A tibble: 2 x 5
  variable  mean       sd  rhat ess_bulk
  <chr>    <dbl>    <dbl> <dbl>    <dbl>
1 k        1.00  0.000331  1.00     672.
2 Tr       0.637 0.000105  1.00     673.

and

4 Likes

(ii) once the path-finder seems to converge, you can simulate a planetary trajectory based on a single point in the parameter space and do a cheap posterior predictive check.
ā€¦
p.s.: the cheap checks based on a single point in the high prob mass region would also come in handy for the monster model, where running the chains takes a long time, especially when the model misbehaves.

Interesting idea. Thank you for sharing!

@yizhang This is a very nice take on the problem and I think it would be a valuable addition to the case study.

I went after a ā€œnaturalā€ formulation based on Newtonā€™s laws, and I would say the theory of motion, rather than a ā€œnaturalā€ formulation from on an observational perspective. I think itā€™s worth discussing the difference between the two. And also emphasizing that in either case, you need a strong, if natural, assumption (so truncated prior on k, or one that states the planet undergoes less than one orbit with probability 1, as opposed to, say, probability 0.999). Iā€™m still amazed at how unforgiving the wiggles in the tails are.

Now, if I were being cheeky, I could propose the object we observe is orbiting another star (so exoplanet imaging), and we have little a priori knowledge about what the orbiting time scale of the object would be. So a wider prior on k and no truncation possible. But this would be a bit of stretch.

Yeah the approach I took above can by no means generalized, as itā€™s based on the fact that the orbit is elliptic and the data clearly exhibits the position of aphelion or perihelion. I just feel this piece of information, though presented by data, has not been taken advantage of in the original approach, which on the other hand is much more universal.

May I revive this thread to ask whether it would be feasible to first weed out the uninteresting modes by sampling from an approximate model (higher tolerance for the solver), and using draws Form these samples weighted by the posterior to find initial points for a more accurate model (lower tolerance for the solver)?

1 Like