Problem with "Rejecting initial value:..." error while fitting an ODE

Hello! I’m trying to fit the same model to three different time series that I stochastically generated from three different parameterisations of the same model I’m trying to fit. I am using priors wide enough to encompass all three parameterisations, though initial population abundances differ between parameterisations.


Screenshot 2022-09-20 at 5.15.26 PM

The model is a within-host population dynamics model:
dP/dt = Pr - H(OP/(1 + OhP))
dH/dt = b + H
(c*(OP/(1 + Oh*P)) - u)

Deterministically, the first parameterisation produces damped oscillations, while the second and third produce a stable limit cycle and divergent oscillations (termed "unstable limit cycle in above images), respectively. Stochastically, the first two parameterisations produce oscillations, and the third parameterisation produces a hard-to-describe/very unstable time series, which may be part of the problem.



When I try to fit the model to the data arising from the first parameterisation, things go okay (though I do get some “Rejecting initial value:…” warnings)!

When I try to fit the model to the data arising from the second parameterisation, things don’t go well, but they go (though I still get some “Rejecting initial value:…” warnings).

But when I try to fit the model to the data arising from the third parameterisation, the run fails, and throws the “Rejecting initial value: Log probability elevates to log(0), i.e. negative infinity.” error.

I suspect these differences come back, in part, to the dynamical differences between parameterisations, but there must be something else going on… I’d really appreciate any ideas people may have. Thank you! :)

Here’s my fitting script:

functions {
  real[] dz_dt(real t,       
               real[] z,     
               real[] theta, 
               real[] x_r,  
               int[] x_i) {
    real P = z[1];
    real H = z[2];
    
    real r = theta[1];  
    real O = theta[2];
    real h = theta[3];
    real b = theta[4];
    real c = theta[5];
    real u = theta[6];
    
    real dP_dt = P*r - H*(O*P/(1 + O*h*P));
    real dH_dt = b + H*(c*(O*P/(1 + O*h*P))-u);
    return { dP_dt, dH_dt };
  }
}
data {
  int<lower = 0> N;           
  real ts[N];                 
  real y_init[2];             
  real<lower = 0> y[N, 2];    
}
parameters {
  real<lower = 0> theta[6];   
  real<lower = 0> z_init[2];  
  real<lower = 0> sigma[2];   
}
transformed parameters {
  real z[N, 2]
  = integrate_ode_rk45(dz_dt, z_init, 0, ts, theta,
                       rep_array(0.0, 0), rep_array(0, 0),
                       1e-5, 1e-3, 5e2);
}
model {
  theta[{1}] ~ normal(1, 3); // r = 2.5
  theta[{2}] ~ uniform(0, 1); // O = 0.008
  theta[{3}] ~ uniform(0, 1); // h = 0.06
  theta[{4}] ~ uniform(0, 1000); // b = 35
  theta[{5}] ~ normal(0.5, 1); // c = 0.2
  theta[{6}] ~ normal(0.5, 1); // u = 0.2
  sigma ~ lognormal(-1, 1);
  z_init ~ lognormal(5, 1); // [80,200]
  for (k in 1:2) {
    y_init[k] ~ lognormal(log(z_init[k]), sigma[k]);
    y[ , k] ~ lognormal(log(z[, k]), sigma[k]);
  }
}
generated quantities {
  real y_init_rep[2];
  real y_rep[N, 2];
  for (k in 1:2) {
    y_init_rep[k] = lognormal_rng(log(z_init[k]), sigma[k]);
    for (n in 1:N)
      y_rep[n, k] = lognormal_rng(log(z[n, k]), sigma[k]);
  }
}

I believe you are right that the stochastic output is responsible for a part of the issue, in fact it’s probably responsible for most of it, since it may cause the model output to deviate considerably from the deterministic model. The most critical issue, that with the third model is that you end up with very low pathogen densities between the peaks, which deterministically mean just that, but in a stochastic (maybe discrete as well?) setting is prone to their extinction. Since there’s a qualitative difference between the two versions here there’s little hope of recovering the correct parameters or even producing samples that have nonzero probability (or below the numerical limits, in practice).

There are some similarities with this work. Depending on how much the stochastic model deviates from the deterministic you may still be able to recover reasonable parameters (for a population large enough the stochastic system will approach the deterministic to some extent), but my experience is that these things sometimes get messy pretty quick.

Thanks so much for your reply! Yes, I was worried this would be the case, but it’s nice to have that thought confirmed! I’ll play around with the likelihood function a bit because I don’t think I’ve thought that out as well as I could have, but in terms of that third parameterisation in particular, this might be the end of the road. I’ll take a look at your pre-print and see if that gives me any ideas!

Do you happen to have any ideas regarding the “Rejecting initial value: Log probability evaluates to log(0), i.e. negative infinity.” error in my first and second parameterisations?

Thanks so much again for your reply!

The scales are barely legible, but looking up closely the population sizes are pretty small, so before giving up I’d try to make sure that these numbers are realistic; if not you can increase them (whether they are governed by initial conditions only, or if there’s some sort of carrying capacity governed by the parameters) so you get stochastic dynamics that don’t differ so much from the deterministic and are not as prone to extinctions.

These are pretty common for large models, especially nonlinear ODEs where the output can vary a lot with small parameter changes. If for the initial (by default random) set of parameters the output is far off from the data the likelihood for some data point may be vanishingly small and be essentially zero, that’s where the log(0) comes from which will prevent the likelihood from being evaluated and be a non starter.

The simple solution is to fix (one or more, or all) the initial parameter values to make sure the output isn’t too far (you can relax this later on, when you get a better idea of what values are generating zero-probability outputs).

1 Like

Thanks so much for this response; it was really helpful! :)

Okay, this makes sense! I might because able to do the for the final parameterisation, so I’ll play around with that, and update the post if anything changes! Thanks!

Thank you for explaining the log(0) problem; this makes a lot of sense , now. When you say I should try to fix some/all of the initial parameter values, what do you mean? That I should tell the model which values to start with instead of letting it have free rein?

Thanks so much, again!!

Exactly. Not sure what interface to Stan you are using, but the sample command in any one of them should allow you to pass a dictionary through the inits argument (e.g. inits={theta=1, sigma=0.1} in Python, but it’d be something similar in other languages).

Ideally you should let initial points be random, but sometimes (particularly with ODEs) leaving it to Stan alone will lead to the problems you are seeing. An alternative is to manually choose random values around some initial value you know works, and increase the variance so you end up starting from points in the parameter space that are different enough that you’re more confident the chains are converging to the right place.

1 Like

Ahhh, okay, I see what you’re saying; this makes sense! I’ll try giving it some initial values and see if that improves the fit significantly; at the very least it should help answer my original question more definitively!

EDIT: I just tried adding initial values, and surprisingly, they didn’t really improve the fit in any meaningful way.

Thank you so much for your continued responses; you’ve been incredibly helpful! :)