OSX 10.11.6
Pystan 2.17.0.0 (I had compiler problems with Pystan 2.18)
Python 3.6.8, Anaconda
GCC 4.2.1 Compatible Clang 4.0.1
As practice for something more realistic (and to get a hang of Stan), I’ve been trying to estimate parameters from simulated data for an absurdly simple ODE model.
\frac{dx}{dt} = \alpha - \gamma_d x
with log-normal noise of fixed spread around the true value (to keep observations positive). Eventually I’d like to estimate the magnitude of noise, too, but I want to stay clear of possible funnels and other hierarchical issues until I can get this simpler model to work.
When I sample on this, I get a super-clean, stationary distribution around some reasonable values, in a reasonable amount of time. However, while the parameter values are reasonable, they are wrong. Both \alpha and \gamma_d are consistently under-estimated. Changing my ground-truth parameters does influence the posterior estimates, but only weakly – a ten-fold increase in actual parameters leads to a ~2-fold increase in posterior estimates.
The best theory I have now for why Stan isn’t finding the right values is that it’s running into some kind of identifiability problem… this system plateaus at a value that depends on \frac{\alpha}{\gamma_d}, so a lot of the information in a time-series for this system only informs that ratio. But! The system should still be completely identifiable – rise time is completely dependent on \gamma.
Any ideas what might be going wrong here?
Stan code is
// Generative model for a very simple constitutive expression ODE -- x is
// produced at some constant rate and degraded/diluted proportionally to
// its concentration. Detection noise is lognormal around the true value.
// params[1] = alpha (production rate)
// params[2] = gamma_d (degradation/dilution rate)
functions {
real[] constitutive_production(real t,
real[] xs,
real[] params,
real[] x_r,
int[] x_i) {
real dx_dt[1];
dx_dt[1] = params[1] - xs[1]*params[2];
return dx_dt;
}
}
data {
real<lower = 0> init[1];
int<lower = 1> n_times;
int<lower = 1> n_params;
int<lower = 1> n_vars;
real<lower = 0> t0;
real<lower = 0> ts[n_times];
real<lower = 0> xs[n_times];
}
transformed data {
real x_r[0];
int x_i[0];
}
parameters {
real<lower=0> params[n_params];
/*
1: alpha
2: gamma_d
3: sigma
4: noise_floor
*/
}
transformed parameters{
real<lower = 0> x_hat[n_times, n_vars]; // Output from the ODE solver
real<lower = 0> x0[n_vars]; // Initial conditions
x0 = init;
x_hat = integrate_ode_rk45(constitutive_production, x0, t0, ts, params,
x_r, x_i);
}
model {
//print("Nothing added. logprob = ", target());
params[1] ~ normal(8, 0.1); // alpha
params[2] ~ normal(4, 0.1); // gamma_d
for (t in 1:n_times)
xs[t] ~ lognormal(log(x_hat[t, 1]), 0.05);
}
generated quantities {
real<lower = 0> sim_x_hats[n_times, 1];
real sim_xs[n_times, 1];
real dummy_sim_xs[n_times, 1];
real dummy_params[n_params];
dummy_params[1] = 4.0;
dummy_params[2] = 2.0;
sim_x_hats = integrate_ode_rk45(constitutive_production, init, t0,
ts, params, x_r, x_i);
dummy_sim_xs = integrate_ode_rk45(constitutive_production, init, t0,
ts, dummy_params, x_r, x_i);
for (t in 1:n_times)
sim_xs[t, 1] = lognormal_rng(log(sim_x_hats[t,1]), 0.05);
}
Data is generated with the following Python code:
init = np.array([1e-4])
n_times = 100
t_min = 1e-2
t_max = 30
ts = np.linspace(t_min, t_max, n_times)
real_alpha = 1
real_gamma = .5
real_sigma = 0.05
real_floor = 0
def ode_func(t, xs):
return np.array([real_alpha - xs[0]*real_gamma])
ode_sol = scipy.integrate.solve_ivp(ode_func, (t_min, t_max),
init, dense_output = True)
x_hats = ode_sol.sol(ts)[0]
x_obs = np.random.lognormal(np.log(x_hats + real_floor), real_sigma)
#x_obs = np.abs(np.random.normal(x_hats, real_sigma))
Stan run with:
model = stan_utility.compile_model('simplest_with_lognormal_nosie.stan')
simu_data ={"init": init,
"n_times": n_times,
"n_params": 2,
"n_times": len(ts),
"n_vars": len(init),
"t0": 0,
"ts": ts,
"xs": x_obs}
fit = model.sampling(data=simu_data, test_grad = False,
iter=6000, warmup=300, chains=5,
seed=4832)
Fit summary is
Inference for Stan model: anon_model_7369f701aac2396ae72ec48d47f20192.
5 chains, each with iter=6000; warmup=300; thin=1;
post-warmup draws per chain=5700, total post-warmup draws=28500.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
params[0] 0.69 1.5e-4 0.01 0.67 0.69 0.69 0.7 0.72 6292 1.0
params[1] 0.34 8.4e-5 6.6e-3 0.33 0.33 0.34 0.34 0.35 6291 1.0
plus a bunch of not-useful information about the ODE solutions at each time.
stan_utility.check_all_diagnostics yields
n_eff / iter for parameter x0[0] is 0.00010526315789473685!
n_eff / iter for parameter dummy_sim_xs[0,0] is 0.00010526315789473685!
<etc, snip>
n_eff / iter below 0.001 indicates that the effective sample size has likely been overestimated
Rhat for parameter x_hat[0,0] is nan!
Rhat for parameter x0[0] is nan!
Rhat for parameter sim_x_hats[0,0] is nan!
Rhat for parameter sim_xs[0,0] is nan!
Rhat for parameter dummy_sim_xs[0,0] is nan!
<etc, snip>
Rhat for parameter dummy_params[0] is nan!
Rhat for parameter dummy_params[1] is nan!
Rhat above 1.1 indicates that the chains very likely have not mixed
0.0 of 28500 iterations ended with a divergence (0.0%)
0 of 28500 iterations saturated the maximum tree depth of 10 (0.0%)
E-BFMI indicated no pathological behavior
This reads to me like I’ve made some kind of dumb, simple beginner’s mistake, but… well, I don’t see it, and our lab doesn’t really have any Stan expertise I can call on. Any help would be much appreciated!