Toy ODE model problem -- posterior predictives not consistent with real values

OSX 10.11.6
Pystan 2.17.0.0 (I had compiler problems with Pystan 2.18)
Python 3.6.8, Anaconda
GCC 4.2.1 Compatible Clang 4.0.1

As practice for something more realistic (and to get a hang of Stan), I’ve been trying to estimate parameters from simulated data for an absurdly simple ODE model.

\frac{dx}{dt} = \alpha - \gamma_d x

with log-normal noise of fixed spread around the true value (to keep observations positive). Eventually I’d like to estimate the magnitude of noise, too, but I want to stay clear of possible funnels and other hierarchical issues until I can get this simpler model to work.

When I sample on this, I get a super-clean, stationary distribution around some reasonable values, in a reasonable amount of time. However, while the parameter values are reasonable, they are wrong. Both \alpha and \gamma_d are consistently under-estimated. Changing my ground-truth parameters does influence the posterior estimates, but only weakly – a ten-fold increase in actual parameters leads to a ~2-fold increase in posterior estimates.

The best theory I have now for why Stan isn’t finding the right values is that it’s running into some kind of identifiability problem… this system plateaus at a value that depends on \frac{\alpha}{\gamma_d}, so a lot of the information in a time-series for this system only informs that ratio. But! The system should still be completely identifiable – rise time is completely dependent on \gamma.

Any ideas what might be going wrong here?

Stan code is

// Generative model for a very simple constitutive expression ODE -- x is
// produced at some constant rate and degraded/diluted proportionally to
// its concentration. Detection noise is lognormal around the true value.

// params[1] = alpha (production rate)
// params[2] = gamma_d (degradation/dilution rate)

functions {
    real[] constitutive_production(real t,
                                   real[] xs,
                                   real[] params,
                                   real[] x_r,
                                   int[] x_i) {
        real dx_dt[1];

        dx_dt[1] = params[1] - xs[1]*params[2];
        return dx_dt;
    }
}

data {
    real<lower = 0> init[1];
    int<lower = 1> n_times;
    int<lower = 1> n_params;
    int<lower = 1> n_vars;
    real<lower = 0> t0;
    real<lower = 0> ts[n_times];
    real<lower = 0> xs[n_times];
}

transformed data {
    real x_r[0];
    int x_i[0];
}

parameters {
    real<lower=0> params[n_params];
    /*
    1: alpha
    2: gamma_d
    3: sigma
    4: noise_floor
    */

}

transformed parameters{
    real<lower = 0> x_hat[n_times, n_vars]; // Output from the ODE solver
    real<lower = 0> x0[n_vars]; // Initial conditions

    x0 = init;

    x_hat = integrate_ode_rk45(constitutive_production, x0, t0, ts, params,
                               x_r, x_i);
}

model {
    //print("Nothing added. logprob = ", target());
    params[1] ~ normal(8, 0.1); // alpha
    params[2] ~ normal(4, 0.1); // gamma_d

    for (t in 1:n_times)
        xs[t] ~ lognormal(log(x_hat[t, 1]), 0.05);
}

generated quantities {
    real<lower = 0> sim_x_hats[n_times, 1];
    real sim_xs[n_times, 1];
    real dummy_sim_xs[n_times, 1];
    real dummy_params[n_params];
    dummy_params[1] = 4.0;
    dummy_params[2] = 2.0;
    sim_x_hats = integrate_ode_rk45(constitutive_production, init, t0,
                                          ts, params, x_r, x_i);
    dummy_sim_xs = integrate_ode_rk45(constitutive_production, init, t0,
                                           ts, dummy_params, x_r, x_i);
    for (t in 1:n_times)
        sim_xs[t, 1] = lognormal_rng(log(sim_x_hats[t,1]), 0.05);
}

Data is generated with the following Python code:

init    = np.array([1e-4])
n_times = 100
t_min   = 1e-2
t_max   = 30
ts      = np.linspace(t_min, t_max, n_times)

real_alpha = 1
real_gamma = .5
real_sigma = 0.05
real_floor = 0

def ode_func(t, xs):
    return np.array([real_alpha - xs[0]*real_gamma])

ode_sol = scipy.integrate.solve_ivp(ode_func, (t_min, t_max), 
                                    init, dense_output = True)

x_hats = ode_sol.sol(ts)[0]
x_obs  = np.random.lognormal(np.log(x_hats + real_floor), real_sigma)
#x_obs  = np.abs(np.random.normal(x_hats, real_sigma))

Stan run with:

model = stan_utility.compile_model('simplest_with_lognormal_nosie.stan')
simu_data ={"init": init,
            "n_times": n_times,
            "n_params": 2,
            "n_times": len(ts),
            "n_vars": len(init),
            "t0": 0,
            "ts": ts,
            "xs": x_obs}

fit = model.sampling(data=simu_data, test_grad = False,
                      iter=6000, warmup=300, chains=5,
                     seed=4832)

Fit summary is

Inference for Stan model: anon_model_7369f701aac2396ae72ec48d47f20192.
5 chains, each with iter=6000; warmup=300; thin=1; 
post-warmup draws per chain=5700, total post-warmup draws=28500.

                     mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
params[0]            0.69  1.5e-4   0.01   0.67   0.69   0.69    0.7   0.72   6292    1.0
params[1]            0.34  8.4e-5 6.6e-3   0.33   0.33   0.34   0.34   0.35   6291    1.0

plus a bunch of not-useful information about the ODE solutions at each time.

stan_utility.check_all_diagnostics yields

n_eff / iter for parameter x0[0] is 0.00010526315789473685!
n_eff / iter for parameter dummy_sim_xs[0,0] is 0.00010526315789473685!
<etc, snip>
  n_eff / iter below 0.001 indicates that the effective sample size has likely been overestimated
Rhat for parameter x_hat[0,0] is nan!
Rhat for parameter x0[0] is nan!
Rhat for parameter sim_x_hats[0,0] is nan!
Rhat for parameter sim_xs[0,0] is nan!
Rhat for parameter dummy_sim_xs[0,0] is nan!
<etc, snip>
Rhat for parameter dummy_params[0] is nan!
Rhat for parameter dummy_params[1] is nan!
  Rhat above 1.1 indicates that the chains very likely have not mixed
0.0 of 28500 iterations ended with a divergence (0.0%)
0 of 28500 iterations saturated the maximum tree depth of 10 (0.0%)
E-BFMI indicated no pathological behavior

This reads to me like I’ve made some kind of dumb, simple beginner’s mistake, but… well, I don’t see it, and our lab doesn’t really have any Stan expertise I can call on. Any help would be much appreciated!

Could it be the really tight priors?

I swapped in:

params[1] ~ normal(1.0, 1.0); // alpha
params[2] ~ normal(1.0, 1.0); // gamma_d

And got this as a posterior:

Look at the posterior predictions with the tight priors vs. the loose priors and I think you’ll see the difference. The priors just override the data in the transient time.

I used this R code:

library(deSolve)
library(tidyverse)
library(ggplot2)
library(rstan)
library(GGally)

init = c(0.0)
n_times = 100
t_min = 0.0
t_max = 30
ts = seq(t_min, t_max, length = n_times)

real_alpha = 1
real_gamma = .5
real_sigma = 0.05
real_floor = 0

func = function(t, y, parms) {
  list(real_alpha - y[1] * real_gamma)
}

sol = ode(init, ts, func)

y = rlnorm(nrow(sol), meanlog = log(sol[, "1"]), sdlog = real_sigma)

plot(sol[, "time"], y)

model = stan_model("ode.stan")

fit = sampling(model, data = list(init = array(init, dim = c(1)),
                         n_times = length(ts) - 1,
                         n_params = 2,
                         n_vars = 1,
                         t0 = 0.0,
                         ts = ts[2:length(ts)],
                         xs = y[2:length(ts)]),
               cores = 4,
               chains = 1)

extract(fit, c("params"))$params %>% as_tibble() %>% ggpairs

plot(ts[2:length(ts)], extract(fit, "ys")$ys[100,])

Also you can use transformed parameters in generated quantities:

generated quantities {
  real ys[n_times];
  
  for(t in 1:n_times) {
    ys[t] = lognormal_rng(log(x_hat[t, 1]), 0.05);
  }
}

…huh. Something’s still wrong here. When I’ve run this before, I’ve consistently started with priors with means either equal to or higher than the true parameter values, and I’ve kept getting parameter estimates that were lower than the real value. I definitely can get the priors to wipe out the data, if I set them ~100 standard deviations above the true value, but I don’t think that’s what’s going on here.

Ah, thanks for the tip on re-using the ode solutions! That ought to save quite a bit of time in the long run. =)

also: your initial of the ODE is actually known as it is data if I am not mistaken. You should really reformat the data for the initials in the transformed data block and use that as argument to integrate_ode_rk45.

Double check how you’re generating data. There’s probably an indexing weirdness or something a little funny in the generating process. Or take the R code and convince yourself your model will work.

The first check I’d do is plot x_hats, and x_obs to make sure the noise is getting added appropriately. If that looks fine, plot the replicated data from the generated quantities fit in your model and see what’s up.

edit: changed quote

Hey all, I’ve been distracted from this problem for a couple weeks, but I finally sat down to debug it again. I think I’ve figured out where the problem is and how to fix it, but I don’t understand why it doesn’t work the way I originally wrote it.

The key difference between that working R code above and my not-quite-working Python code seems to be that I use the first data point and bbbales2 throws out the first data point. If I include the first data point, I have to start time at some very small positive value or Stan throws an error, and then I get a pretty bad fit. If I clip off the first timepoint, Stan gives a perfect fit and everything works.

Any idea what’s going wrong when I include t0?

Is it the data at time zero with noise added?

That does seem strange.

Just barely after time 0 (1e-2), yeah, with noise.

How did you know to clip off the first one in your R script?

Hmm, that’s interesting.

Remember the output of the ODE solver in Stan doesn’t include the initial conditions.

So I added a time value close to zero and things still seem to work. The ts starts like:

> ts
  [1]  0.0000000  0.0000010  0.3030303  0.6060606  0.9090909 ...

The posterior still wasn’t too far off the generated parameters.

Here’s the code:

library(deSolve)
library(tidyverse)
library(ggplot2)
library(rstan)
library(GGally)

init = c(0.0)
n_times = 100
t_min = 0.0
t_max = 30
ts = seq(t_min, t_max, length = n_times)
ts[1] = 1e-6
ts = c(0.0, ts)
n_times = n_times + 1

real_alpha = 1
real_gamma = .5
real_sigma = 0.05
real_floor = 0

func = function(t, y, parms) {
  list(real_alpha - y[1] * real_gamma)
}

sol = ode(init, ts, func)

y = rlnorm(nrow(sol), meanlog = log(sol[, "1"]), sdlog = real_sigma)

plot(sol[, "time"], y)

model = stan_model("ode.stan")

fit = sampling(model, data = list(init = array(init, dim = c(1)),
                                  n_times = length(ts) - 1,
                                  n_params = 2,
                                  n_vars = 1,
                                  t0 = 0.0,
                                  ts = ts[2:length(ts)],
                                  xs = y[2:length(ts)]),
               cores = 4,
               chains = 1)

extract(fit, c("params"))$params %>% as_tibble() %>% ggpairs

plot(ts[2:length(ts)], extract(fit, "ys")$ys[100,])

This hints that adjusting tolerance in rk45 might be needed. Note that though Stan’s rk45 uses adaptive step size, the adjustment is based on default error estimation which is a function of relative tolerance, absolute tolerance, and current step solution. Since Stan’s implementation fixes the first step size to 0.1, there’s a possibility that auto-stepping fails to adjust this size properly.

Aha! I did not know that Stan’s ODE solver doesn’t include the initial conditions in its output. It sounds like my data was off by an index? I’m still surprised how far off Stan’s estimates were with only a single-index shift…

Another thing I’m a little weirded out by – it looks like Stan samples about 5-10x faster from R (using the code you sent) than it does from Python. I timed just the sampling statement in both versions with 2000 iterations; Python took ~170 seconds to sample, R took ~10-30 seconds. Any idea why that might be?

Oh interesting. Do you mind sending your latest working Python code? I’m happy to check this on my computer.

Sure. Here’s a script that ought to run. Not sure if it’s important, but my Pystan is a version behind due to some compiler problems.

(I did find one reason for run-time discrepancy – I was running RStan and Pystan on different stan models, and the Pystan model was a bit more complex. When I run on the same model, the difference in time drops down to more like 2-5x, with Pystan taking ~70s.)

import numpy as np
import scipy.stats as stats
import scipy
import math
import matplotlib.pyplot as plt

import pystan
import stan_utility

init    = np.array([1e-4])
n_times = 100
t_min   = 1e-2
t_max   = 30
ts      = np.linspace(t_min, t_max, n_times)

real_alpha = 2
real_gamma = 1

def ode_func(t, xs):
    return np.array([real_alpha - xs[0]*real_gamma])

ode_sol = scipy.integrate.solve_ivp(ode_func, (t_min, t_max),
                                    init, dense_output = True)

x_hats = ode_sol.sol(ts)[0]
x_obs  = np.random.lognormal(np.log(x_hats + real_floor), real_sigma)

extra_compile_args = ['-pthread', '-DSTAN_THREADS', '-O3']

model = stan_utility.compile_model('simplest_ode.stan',
                                   extra_compile_args=extra_compile_args)

simu_data ={"init": init,
            "n_times": n_times,
            "n_params": 2,
            "n_times": len(ts)-1,
            "n_vars": len(init),
            "t0": 0,
            "ts": ts[1:],
            "xs": x_obs[1:]}

fit = model.sampling(data=simu_data, test_grad = False,
                     iter=2000, warmup=300, chains=1,
                     seed=4832, n_jobs = 1)

print(fit.stansummary())

stan_utility.check_all_diagnostics(fit)

sim_x_hats = fit.extract()['sim_x_hats']
sim_xs = fit.extract()['x_hat']

for i in range(100):
    plt.plot(ts[1:], sim_xs[i,:,0], label = "%d" % i, color = 'black',
             linewidth=0.2, alpha = 0.5)
    plt.plot(ts[1:], sim_x_hats[i,:,0], linewidth = 0.2, alpha = 0.5, color = 'blue')
plt.plot(ts, x_obs, color = 'red', linewidth = 1.25, label = "Observed")
plt.plot(ts, x_hats, color = 'green', linewidth = 2, label = "True")
plt.xlabel("Time")
plt.ylabel("[x]")
plt.show()

names = ['alpha', 'gamma_d']
for var_idx in range(len(names)):
    samples = fit.extract()["params"][:,var_idx]
    plt.scatter(list(range(len(samples))), samples, s = 1, label = names[var_idx])
plt.xlabel("Sample")
plt.ylabel("Value")
plt.legend()
plt.show()

Latest Stan code:

// Generative model for a very simple constitutive expression ODE -- x is
// produced at some constant rate and degraded/diluted proportionally to
// its concentration. Detection noise is lognormal around the true value.

// params[1] = alpha (production rate)
// params[2] = gamma_d (degradation/dilution rate)

functions {
    real[] constitutive_production(real t,
                                   real[] xs,
                                   real[] params,
                                   real[] x_r,
                                   int[] x_i) {
        real dx_dt[1];

        dx_dt[1] = params[1] - xs[1]*params[2];
        return dx_dt;
    }
}

data {
    real<lower = 0> init[1];
    int<lower = 1> n_times;
    int<lower = 1> n_params;
    int<lower = 1> n_vars;
    real<lower = 0> t0;
    real<lower = 0> ts[n_times];
    real<lower = 0> xs[n_times];
}

transformed data {
    real x_r[0];
    int x_i[0];
}

parameters {
    real<lower=0> params[n_params];
    /*
    1: alpha
    2: gamma_d
    */

}

transformed parameters{
    real<lower = 0> x_hat[n_times, n_vars]; // Output from the ODE solver
    real<lower = 0> x0[n_vars]; // Initial conditions

    x0 = init;

    x_hat = integrate_ode_rk45(constitutive_production, x0, t0, ts, params,
                               x_r, x_i);
}

model {
    params[1] ~ normal(2, 2.5); // alpha
    params[2] ~ normal(4, 2.5); // gamma_d


    for (t in 1:n_times) {
        xs[t] ~ normal(x_hat[t, 1], 0.1) T[0,];
    }
}

generated quantities {
    real<lower = 0> sim_x_hats[n_times, 1];

    sim_x_hats = integrate_ode_rk45(constitutive_production, init, -0.1,
                                          ts, params, x_r, x_i);
}

I don’t get this. It looks like my PyStan chains take about 6-8 seconds and then about 9-10 for RStan.

I didn’t have the stan_utility package, so I didn’t have these extra compile flags:

extra_compile_args = ['-pthread', '-DSTAN_THREADS', '-O3']

I also switched to the default 1000 warmup 1000 sample split. It seems strange that 300 samples wouldn’t be enough to adapt for two parameters though.

Here’s the Python file I ended up using so you can see the differences:
simplest_ode.py (922 Bytes)

Looks like the slowdown was a combination of a) using an old version of Pystan and b) running the Python and R versions with different numbers of chains. Fixed the chains and figured out how to upgrade Pystan without breaking everything (I have to manually set the ‘MACOSX_DEPLOYMENT_TARGET’ environment variable), and now both scripts run at the same speed.

Thanks for all your help! This would have been immensely more difficult without some working code to compare against.

1 Like