Ensuring positivity of ODE solution


I am writing a STAN model which includes an ODE for the numerical solution of rate equations. Analytically, the solutions to this ODE should always be positive for the feasible parameter range. However, numerically, once the solutions get close to zero, there is a good chance that the solution becomes however so slightly negative.

Let y_hat be the ODE solution, then I look at the model

log_obs ~ normal(log(y_hat), eps);

This clearly leads to problems whenever y_hat is negative. What I am doing at the moment is to iterate through y_hat, after solving the ODE, and setting any negative number to a small positive number, something like:

for (n in 1:n_obs) {
  if (y_hat[n] <= 0.0) y_hat[n] = 1e-12;

This feels somewhat like cheating and I’m not sure if it possibly introduces numerical problems. Is there a better way to ensure the positivity of the ODE solution? I know from my Matlab days that they had a specialised solver for non-negative ODEs. But I guess that’s the problem here, a specialised solver would be necessary?

This should be a common problem in modelling of chemical/biological rate ODEs. Is there somebody else having this problem and has a solution?

1 Like

I think that if the near-and-below zero solutions have little posterior probability, you should be good. If appropriate you can also modify your priors to avoid parameter values that cause the solutions to wander near zero.

I have little detailed knowledge of the ODE solver so I cannot help you directly, but in some cases you may be able to partially solve the ODE to avoid some of the numeric problems. For example, I work with an ODE of the form:

y'(t) = f(t) - d * y(t)

here f(t) is a non-negative function of the data and some parameters but not of y.

Since y(t) occurs only in the linear term, this can be solved as

y(t) = a * exp(-d * t) +  integral[variable:u,from:0,to:t]( f(u) * exp(-d * u) )

Where a is the initial condition. Since f(t) is non-negative, there is no way numerically integrate it to get a negative value.

Credit: The solution for the model is taken from:

Oh, nice! That’s a good trick, thanks for sharing!

One other possibility I found at StackOverflow was to formulate the ODE system in terms of their log terms, thus transforming them to an unconstrained range directly. This works out quite nicely in my simple examples. I am actually testing my code with an ODE of the same form as yours.

y1'(t) = -k1 y1(t)
y2'(t) = k1 y1(t) - k2 y2(t)

which becomes with the new variables ly1 = log(y1) and ly2 = log(y2)

ly1'(t) = -k1
ly2'(t) = k1 exp(ly1 - ly2) - k2

Since the state variables are on a log-scale now it doesn’t matter whether they become positive or negative. The only caveat is that y2(0) = 0 which cannot directly be translated. I tried ly2(0) = 1e-12 but that feels a bit adhoc as well. My first actual observations are quite a bit away from 0 though and are considerably greater than 1e-12 (at the lowest 1e-2), so I think it shouldn’t have much influence.

However, my actual system is not as simple as the one above. I’ll consider both possible solutions and see where I get to.

Thanks so far!

BTW the full WIP code of my model is here:

It is not very well documented, but there might be something to see - I use splines to approximate continuous gene expression profile (which is a part of the f(t)) and then numerically integrate the partially solved equation by trapezoid rule. It is fast and the output is almost indistinguishable from running ode45 in R package deSolve on the original equation.

I don’t think there’s anything particularly wrong with truncating the solution (although check the diagnostics etc). The transformation to a ODE on the log-variables is ok, but you might get some numerical instability if your state is allowed to be near zero (as seems to be the case here). Truncation is probably better.

Thanks for your input, Daniel!

The strange thing is that when I run diagnose on the log ODE model than most errors are at least below zero. But still, for some reason, quite far away from the error threshold of 1e-7. For the truncation model, on the contrary, everything seems to go bananas when I run diagnose. There are loads of extremely large errors, which might come from the manual truncation.

I’m attaching my minimal example. Maybe I am doing something obvious wrong?

@martinmodrak: Thanks for your code! Looks very interesting.

exp.data.R (901 Bytes)
truncation.stan (1.4 KB)
log_ode.stan (1.3 KB)

Have you tried integrate_ode_bdf? I think in chemical systems the systems frequently end up being stiff and can make use of the implicit solvers.

I think it’s the absolute tolerance that’ll help you handle ODEs that go near zero (Google relative vs. absolute and there are some descriptions of this). Have you tried turning that way up? Maybe just put it at 1e-12 and see if you still get negatives?

If you’re still getting negatives with super high tolerances and such I’d be scared there might be a typo in the Eqs.

What if you replace hard truncation with a smooth function (something like an appropriately scaled arctanh)?

1 Like


From my experience a quick and dirty solution for your problem is

  1. integrate with absolute tolerances as low as you are willing to pay the performance hit due to it. The lower you put this eps the more work for the ODE integrator. This eps will always be very small in comparison to what you real y_hat values will be (at least two orders of magnitude).

  2. Just brutally write down the likelihood as log_obs ~ normal(log(y_hat + 10*eps), sigma)

Yes, this creates some bias… but the bias will still be so small that you will not notice it.

Another solution is to use the box-cox transform with a shift parameter. That is much cleaner and for the problems I care about (PK/PD; so also concentrations) these usually perform quite well.


There was a paper on non-negative solution of ODEs. Essentially it’s shown that truncating is not sufficient for some problems. A better solution would be impose a non-negative constraint by redefining the RHS of the ODE system, including effective damping outside the feasible region.

@bbbales2, @anon75146577, @wds15, @yizhang Thank you all for your advice!

@wds15: When you use a box-cox transform, do you estimate the scale and shift parameter before you load the data into your Stan model or do you estimate these in your full model?

For testing, I am using the crude method right now that wds15 suggested. It works well for the current purposes to keep the solution positive in case the ODE solution gets very close to zero. This happens only in case of parameters in low-probability regions anyways.

However, one thing bothers me still: When I run diagnose test=gradient on my test model then I get some really large values for the finite difference approximations for some parameters. The model is so simple that I cannot really see where I should have made an error in my equations. Could this also be related to numbers getting really small in my ode solution for low-probability parameters? I guess the finite differences approximating the sensitivities of the ODE to get the gradient of the parameters going into the ODE could be very unstable once the solution gets close to zero.

For example, this is a result from a run of diagnose

 Log probability=-3358.23

 param idx           value           model     finite diff           error
         0         1.17108         6483.47         6483.47    1.54628e-007
         1         1.04201        -310.916         -242185          241874
         2        0.297269        -21.5027          3461.6         -3483.1
         3         0.24182        -416.358         -524502          524085
         4       -0.876616        -74.6664        -74.6666     0.000220595
         5        0.923798        -9.82334         3720.79        -3730.62
         6         0.52609        -548.188        -8787.05         8238.86
         7         1.39752         302.246          258035         -257732
         8       -0.546857         4.52945         41220.4        -41215.9
         9        0.831083         407.035          197269         -196862
        10         1.72614         71.2488         71.2495    -0.000730462
        11        -1.07626         5.26091         48026.4        -48021.2
        12         1.49623         534.252         -242825          243359
        13         1.47067        -3.12632          143452         -143455
        14        -1.34691        -706.859         27951.8        -28658.6
        15         1.13975        -3.43895          370977         -370981
        16        0.832852       -0.981895       -0.981967    7.12531e-005
        17        0.695118        -256.892        -14628.4         14371.6
        18        0.280268         -6.0746        -93763.3         93757.3

I noticed that model derivatives via AD and finite differences are much more in line for parameter values with higher log probability. Maybe I am using diagnose wrong?

Data and model are attached in case I am doing something obvious wrong.

exp.data.R (901 Bytes)
low_tol.stan (1.4 KB)

EDIT: Out of curiosity, I also tested @Bob_Carpenter’s model which was posted here. Running diagnose test=gradient on that model also leads to large differences between model and finite_diff on my machine. It’s maybe not directly my model after all?

A typical run from diagnose looked like


 Log probability=-1334.85

 param idx           value           model     finite diff           error
         0        0.196752         726.335        -2873.67         3600.01
         1        -1.01118        -456.268         3212.47        -3668.74
         2        0.131759        -755.859        -3677.38         2921.52
         3        0.363325        -437.119         2108.33        -2545.45
         4        -1.71535         366.054         2707.01        -2340.96
         5         0.45776         270.602          2972.9         -2702.3
         6         1.96192        -16.8809        -16.8821       0.0011476
         7        -1.35977         1777.21         1777.56        -0.35014

Yes, I do fit both parameters. This implies that you formulate the log-likelihood on the untransformed scale and essentially define a box-cox density using the transformation method. That is, on the transformed scale things are distributed as a normal and you need to work out the Jacobian of the transform. I have worked this out a while ago and it was very smooth to fit it. Digging on my computer did not have a hit, but should I find it again and you are interested I am happy to share.

Wrt. to your diagnose issue… maybe you bump the ODE thresholds to smaller values and see if things improve? I am not suggesting that you need to run your program at very low ODE tolerances as long as you are happy with the results (they make sense/you don’t have divergences/etc.).

Lots of great advice already, but let me clarify two things.

The problem you run into with Stan is that it cuts ties to gradients.

Finite differences are super unstable near zero.

I’m definitely interested, in case you dig it up at some point.

That’s what I was guessing at some point. Thanks for confirming it!

Thank you all for your time and help!

@Cyianor. I am having a similar problem as yours and was wondering if you have solved the problem by using
log_obs ~ normal(log(y_hat + 10*eps), sigma) ? What does eps here means? Is it a variable in your model or something in Stan language?


don’t have a lot of time… the fundamental issue is that the stan ode solver only gurantees you positivity up to an accuracy of the absolute tolerance at which you run the ODE integrator (the default is 1E-9 or so). Thus, the integrator will return negative values in that order. You have to solutions here:

  • easy: just add 1E-6 or so to the ode solution (so log(y_ode + 1E-6))… the 1E-6 needs to be reasonably above the absolute error at which you run the integrator… this is an easy solution, but more of a hack

  • better, but also has issues: right now you integrate dy/dt = f(t,y) and you get y(t) which is positive in your case… you can just make the integrator return log(y(t)) directly by integrating 1/y * dy/dt = d(log(y))/dt = f(y,t)/exp(log(y(t)))

the second approach is more elegant and more stable if you y does only vary smoothly in log space


@wds15. Thanks for the suggestions.
I used the first choice from the two suggested methods and the code now works.