Divergences when trying to infer initial position and velocity from angular measurements

Hi All,

I was wondering if anyone could help us to understand the following. We are considering a set of angle and angle rate measurements of an object moving along an ellipse and we would like to recover the initial position (r_0) and velocity (\dot{r}_0) given that the object moves according to the ODE,

\ddot{\mathbf{r}} = -\frac{1}{r^3}\mathbf{r}

where r=(x,y), and the angle and angle rates are related to the position and velocity via

\alpha = \tan ^{-1} (x/y), \quad \mathrm{and} \quad \dot{\alpha} = \frac{x\dot{y} - y\dot{x}}{x^2 + y^2}.

We have considered the problem starting with the initial state (r_0,\dot{r}_0)=(1,0,0,1) and have generated the angle measurements as it moves using the Stan Model “StanPropagate” and performed the inference using the Stan Model “StanInference2d” as below.

StanPropagate = """
   functions {
      real[] ode(real t, real[] y, real[] theta, real[] x_r, int[] x_i) {
         real dydt[4];
         dydt[1] = y[3];
         dydt[2] = y[4];
         dydt[3] = -(theta[1])/(pow(sqrt(y[1]*y[1] + y[2]*y[2]),3))*y[1];
         dydt[4] = -(theta[1])/(pow(sqrt(y[1]*y[1] + y[2]*y[2]),3))*y[2];
         return dydt;
      }
      real[] h(real[] y) {
         real m[2];
         
         m[1] = atan2(y[2],y[1]);
         m[2] = (y[1]*y[4] - y[2]*y[3])/(y[1]*y[1] + y[2]*y[2]);
         return m;
      }
   }
   data {
      int<lower=1> T; //total number of observations including at t0
      real t0;
      real theta[1];
      real ts[T-1];
      real y0[4];
   }
   transformed data {
      real x_r[0];
      int x_i[0];
   }
   generated quantities {
      real y_hat[T,4];
      real z[T,2];
      y_hat[1] = y0;
      y_hat[2:T] = integrate_ode_rk45(ode, y0, t0, ts, theta, x_r, x_i);
      for (t in 1:T) {
         z[t] = h(y_hat[t]);
      }
   }
   """

StanInference2d = """
   functions {
      real[] ode(real t, real[] y, real[] theta, real[] x_r, int[] x_i) {
         real dydt[4];
         dydt[1] = y[3];
         dydt[2] = y[4];
         dydt[3] = -(theta[1])/(pow(sqrt(y[1]*y[1] + y[2]*y[2]),3))*y[1];
         dydt[4] = -(theta[1])/(pow(sqrt(y[1]*y[1] + y[2]*y[2]),3))*y[2];
         return dydt;
      }
      real[] h(real[] y, real alpha) {
         real m[2];
         m[1] = atan2(y[2],y[1]);
         m[2] = (y[1]*y[4] - y[2]*y[3])/(y[1]*y[1] + y[2]*y[2]);
         if ( (alpha < - pi()/2) && (m[1] > pi()/2)) {
            m[1] = m[1] - 2*pi();
         }
         if ( (alpha > pi()/2) && (m[1] < - pi()/2)) {
            m[1] = m[1] + 2*pi();
         }
         return m;
      }
   }
   data {
      int<lower=1> T; //total number of observations including at t0
      real t0;
      real theta[1];
      real ts[T-1];
      real z[T,2];
   }
   transformed data {
      real x_r[0];
      int x_i[0];
   }
   parameters {
      real y0[4];
      vector<lower=0>[2] sigma;
   }
   transformed parameters {
   real r;
   real v;
   real<lower=0.01, upper = 3> a;
   r = sqrt( y0[1]*y0[1] +  y0[2]*y0[2]);
   v = sqrt( y0[3]*y0[3] +  y0[4]*y0[4]);
   a = 1/((2/r) - (v*v));
   }
   model {
      real y_hat[T,4];
      y_hat[1] = y0;
      y_hat[2:T] = integrate_ode_rk45(ode, y0, t0, ts, theta, x_r, x_i);
      for (t in 1:T) {
         z[t] ~ normal(h(y_hat[t], z[t,1]),sigma);
      }
   }
   """

For three sets of measurements i.e. z =[(\alpha_1, \dot{\alpha_1}), (\alpha_2, \dot{\alpha_2}),(\alpha_3, \dot{\alpha_3})], the sampler is able to converge. However for two sets of measurements z =[(\alpha_1, \dot{\alpha_1}), (\alpha_2, \dot{\alpha_2})] the sampler is unable to converge and produces results such as those which can be seen in the attached figures (2Measurements-example1 and Measurements-example2, where the grey represents one focus of the ellipse, the black points represent the true positions of the object at given times, the red point is the initial point used by the sampler which was designed to produce the same initial measurements, and the blue points are the results from the sampler). Although it can be seen that there is a slightly denser region lying in the plane between the focus and the initial point, I anticipated that this would have been denser and the samples less spread out so far (similar to AnticipatedSamples as attached). Here is an example output from Stan warning that 92% of iterations resulted in a divergence despite using a high adapt_delta.

WARNING:pystan:Rhat above 1.1 or below 0.9 indicates that the chains very likely have not mixed
WARNING:pystan:919 of 1000 iterations ended with a divergence (91.9 %).
WARNING:pystan:Try running with adapt_delta larger than 0.9999999 to remove the divergences.

Inference for Stan model: anon_model_a09ba334c2a9641b0f4e0314a06d6912.
1 chains, each with iter=6000; warmup=5000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=1000.

mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
y0[1] 0.54 0.26 1.26 -2.46 0.07 0.67 1.12 3.08 24 1.0
y0[2] -0.4 0.26 0.9 -2.57 -0.86 -0.19 0.11 1.13 11 1.11
y0[3] 0.11 0.15 0.64 -1.17 -0.25 0.14 0.49 1.34 17 1.08
y0[4] 0.12 0.17 0.57 -1.18 -0.25 0.21 0.52 1.02 11 1.0
sigma[1] 36.15 23.44 315.47 5.0e-3 0.35 1.78 4.1 87.5 181 1.0
sigma[2] 3.13 0.64 6.92 0.03 0.46 1.23 2.85 20.86 117 1.05
r 1.48 0.19 0.82 0.44 0.83 1.23 1.96 3.41 19 1.18
v 0.78 0.09 0.39 0.16 0.45 0.74 1.05 1.58 18 1.14
a 1.37 0.08 0.73 0.27 0.78 1.27 1.89 2.86 78 1.0
lp__ -1.07 0.88 2.9 -5.93 -2.91 -1.64 0.37 5.77 11 1.08

Samples were drawn using NUTS at Thu May 21 07:29:16 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

Is this behaviour normal, does Stan find it difficult to converge with such limited data? and is there a way that we could reparameterise the model to help avoid this?
Please also find attached the Python script used to generate these results.

Thanks

Gemma

2d-Unit.py (4.7 KB)

@Bob_Carpenter: this is the problem i described a few weeks ago and which, if i recall correctly, you thought to be reminscent of the problem encountered in 8-schools and other funnel-shaped pdfs. The challenge is that we can’t see how to parameterise, assuming that’s the way to go.

y0 and sigma both have improper priors, which means the prior distributions can’t be normalized to one.

I suspect priors, even relatively loose ones, will help here.

Put them on y0 and sigma both. It might seem weird to put priors on the estimates of initial conditions, but you should be doing that anyway. I doubt that your prior belief is actually that the initial conditions could be anywhere between -inf to inf!

1 Like

Thank you for your help, that seems to have fixed the problem, I’m now getting a result much more like what I expected. I did have have priors in when I wasn’t supplying the initial conditions and then I stupidly assumed I didn’t need them anymore!!

Thanks again :)

@bbbales2: thank you very much indeed. I feel like i should have spotted this one but clearly didn’t. I hereby owe you (at least) one (more?!) generic drink!

I am intrigued by this though; i had assumed it was something to do with leapfrog malfunctioning, though i don’t know why, and am now unsure as to why this is a problematic function (not an unnormalised pdf i guess) to sample from. Do you think the real issue is that the “posterior” (with the improper priors) is improper making something about applying mcmc invalid or do you think it is a specific issue related to leapfrog and the adaption used in Stan? Just keen to understand your intuition.

@gcook good to hear it worked out!

I’m not really sure about the cause.

The thinking was that there were six parameters that could go off to infinity and only four data points.

So not enough equations? Sort of?

I’m not quite sure what is making the output look nasty, but if the posterior without the priors was indeed improper, then it could be about anything, and maybe multiple things breaking down. Adaptation depends on MCMC, so it presumably breaks if MCMC breaks.

I don’t know if the improper-ness of the priors would effect the Hamiltonian in any way. Looks to me like H = p^2/m is a valid Hamiltonian system where everything just goes straight, so presumably those dynamics would be fine. I guess the u-turn would be broken though, but that’s more of a stats thing.

@bbbales2: Thanks for sight of your thinking. I’m reassured that it sounds like it’s not completely stupid to have thought that it was the adaption for leapfrog’s step-size that was at fault!

What also affects the posterior is that rk45 is not symplectic(actually none of explicit Runge-Kutta methods is symplectic), so the gravitational dynamics to be simulated here would have biased trajectories using rk45. Unfortunately none of Stan’s integrators is ideal for this kind of problems.