Hi All,
I was wondering if anyone could help us to understand the following. We are considering a set of angle and angle rate measurements of an object moving along an ellipse and we would like to recover the initial position (r_0) and velocity (\dot{r}_0) given that the object moves according to the ODE,
where r=(x,y), and the angle and angle rates are related to the position and velocity via
We have considered the problem starting with the initial state (r_0,\dot{r}_0)=(1,0,0,1) and have generated the angle measurements as it moves using the Stan Model “StanPropagate” and performed the inference using the Stan Model “StanInference2d” as below.
StanPropagate = """
functions {
real[] ode(real t, real[] y, real[] theta, real[] x_r, int[] x_i) {
real dydt[4];
dydt[1] = y[3];
dydt[2] = y[4];
dydt[3] = -(theta[1])/(pow(sqrt(y[1]*y[1] + y[2]*y[2]),3))*y[1];
dydt[4] = -(theta[1])/(pow(sqrt(y[1]*y[1] + y[2]*y[2]),3))*y[2];
return dydt;
}
real[] h(real[] y) {
real m[2];
m[1] = atan2(y[2],y[1]);
m[2] = (y[1]*y[4] - y[2]*y[3])/(y[1]*y[1] + y[2]*y[2]);
return m;
}
}
data {
int<lower=1> T; //total number of observations including at t0
real t0;
real theta[1];
real ts[T-1];
real y0[4];
}
transformed data {
real x_r[0];
int x_i[0];
}
generated quantities {
real y_hat[T,4];
real z[T,2];
y_hat[1] = y0;
y_hat[2:T] = integrate_ode_rk45(ode, y0, t0, ts, theta, x_r, x_i);
for (t in 1:T) {
z[t] = h(y_hat[t]);
}
}
"""
StanInference2d = """
functions {
real[] ode(real t, real[] y, real[] theta, real[] x_r, int[] x_i) {
real dydt[4];
dydt[1] = y[3];
dydt[2] = y[4];
dydt[3] = -(theta[1])/(pow(sqrt(y[1]*y[1] + y[2]*y[2]),3))*y[1];
dydt[4] = -(theta[1])/(pow(sqrt(y[1]*y[1] + y[2]*y[2]),3))*y[2];
return dydt;
}
real[] h(real[] y, real alpha) {
real m[2];
m[1] = atan2(y[2],y[1]);
m[2] = (y[1]*y[4] - y[2]*y[3])/(y[1]*y[1] + y[2]*y[2]);
if ( (alpha < - pi()/2) && (m[1] > pi()/2)) {
m[1] = m[1] - 2*pi();
}
if ( (alpha > pi()/2) && (m[1] < - pi()/2)) {
m[1] = m[1] + 2*pi();
}
return m;
}
}
data {
int<lower=1> T; //total number of observations including at t0
real t0;
real theta[1];
real ts[T-1];
real z[T,2];
}
transformed data {
real x_r[0];
int x_i[0];
}
parameters {
real y0[4];
vector<lower=0>[2] sigma;
}
transformed parameters {
real r;
real v;
real<lower=0.01, upper = 3> a;
r = sqrt( y0[1]*y0[1] + y0[2]*y0[2]);
v = sqrt( y0[3]*y0[3] + y0[4]*y0[4]);
a = 1/((2/r) - (v*v));
}
model {
real y_hat[T,4];
y_hat[1] = y0;
y_hat[2:T] = integrate_ode_rk45(ode, y0, t0, ts, theta, x_r, x_i);
for (t in 1:T) {
z[t] ~ normal(h(y_hat[t], z[t,1]),sigma);
}
}
"""
For three sets of measurements i.e. z =[(\alpha_1, \dot{\alpha_1}), (\alpha_2, \dot{\alpha_2}),(\alpha_3, \dot{\alpha_3})], the sampler is able to converge. However for two sets of measurements z =[(\alpha_1, \dot{\alpha_1}), (\alpha_2, \dot{\alpha_2})] the sampler is unable to converge and produces results such as those which can be seen in the attached figures (2Measurements-example1 and Measurements-example2, where the grey represents one focus of the ellipse, the black points represent the true positions of the object at given times, the red point is the initial point used by the sampler which was designed to produce the same initial measurements, and the blue points are the results from the sampler). Although it can be seen that there is a slightly denser region lying in the plane between the focus and the initial point, I anticipated that this would have been denser and the samples less spread out so far (similar to AnticipatedSamples as attached). Here is an example output from Stan warning that 92% of iterations resulted in a divergence despite using a high adapt_delta.
WARNING:pystan:Rhat above 1.1 or below 0.9 indicates that the chains very likely have not mixed
WARNING:pystan:919 of 1000 iterations ended with a divergence (91.9 %).
WARNING:pystan:Try running with adapt_delta larger than 0.9999999 to remove the divergences.
Inference for Stan model: anon_model_a09ba334c2a9641b0f4e0314a06d6912.
1 chains, each with iter=6000; warmup=5000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=1000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
y0[1] 0.54 0.26 1.26 -2.46 0.07 0.67 1.12 3.08 24 1.0
y0[2] -0.4 0.26 0.9 -2.57 -0.86 -0.19 0.11 1.13 11 1.11
y0[3] 0.11 0.15 0.64 -1.17 -0.25 0.14 0.49 1.34 17 1.08
y0[4] 0.12 0.17 0.57 -1.18 -0.25 0.21 0.52 1.02 11 1.0
sigma[1] 36.15 23.44 315.47 5.0e-3 0.35 1.78 4.1 87.5 181 1.0
sigma[2] 3.13 0.64 6.92 0.03 0.46 1.23 2.85 20.86 117 1.05
r 1.48 0.19 0.82 0.44 0.83 1.23 1.96 3.41 19 1.18
v 0.78 0.09 0.39 0.16 0.45 0.74 1.05 1.58 18 1.14
a 1.37 0.08 0.73 0.27 0.78 1.27 1.89 2.86 78 1.0
lp__ -1.07 0.88 2.9 -5.93 -2.91 -1.64 0.37 5.77 11 1.08
Samples were drawn using NUTS at Thu May 21 07:29:16 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Is this behaviour normal, does Stan find it difficult to converge with such limited data? and is there a way that we could reparameterise the model to help avoid this?
Please also find attached the Python script used to generate these results.
Thanks
Gemma
2d-Unit.py (4.7 KB)