Dear Stan community, @yizhang , @billg, and I released a preprint this week demonstrating the use of Stan and Metrum’s homebrewed extension, Torsten, for pharmacometrics modeling.
This is the first part of a two-parts tutorial, targeted at pharmacometricians. It includes detailed examples and code. We wrote this paper as Torsten and Stan developers, and also instructors who have taught workshops on Bayesian pharmacometrics modeling: we devoted a lot of effort to clarifying various Bayesian concepts and emphasize some of the more subtle points which sometimes get lost. Part II of the tutorial is in progress and will demonstrate more advanced features of Stan and Torsten.
I’m always excited to write about Torsten because this is the project that introduced me to Bayesian statistics and ODE-based models!
So how would eg for the last nonlinear example compare NONMEM, Torstan and Stan without Torsten? Let’s say in terms of usability, accuracy of the uncertainty quantification and efficiency?
The stan/torsten vs nonmem benchmark is in progress. On the hand, Stan’s rich diagnostics is definitely missing in nonmem’s BAYES.
The last example is actually easier to code using just Stan, because it involves only a single patient. It’s there as a footstep to the part 2 of the tutorial, where population models with various likelihood structures will be explored. Generally when it comes to population model, coding various dosing events becomes less pleasant in Stan, and that’s one of the major motivations of torsten.
We choose to follow nonmem events schedule nomenclature and this has both
pros
it’s friendly to PMX practitioners due to nonmem prevalence.
it’s easier to be integrated into existing workflows
and cons:
ODE definition must comply with old Stan style.
function signature could be complicated (but we are working on simplified versions)
Another issue of using nonmem-style events is that when some of the events are Stan parameters we have to define the entire array as parameter, e.g on p.26 of the paper
real<lower = 0> parms[9];
parms = {CL , Q, V1, V2, ka, mtt , circ0 , gamma , alpha };
becomes inefficient when, say only clearance CL is a parameter. This can be resolved when tuple becomes available.
Even though there’s only a single patient, you still need to handle multiple doses. What’s more, Torsten supports a mixed solver, meaning you can solve the PK analytically, while numerically solving the remaining PD equations. Coding this directly in Stan is possible, but long and error prone. Not only do you need to write the analytical solution to the PK system, it also takes a bit of book-keeping to then write the ODE which gets solved numerically.
So inspired by this thread on the adjoint ODE solver, what are your experiences with the adjoint solver? Or with the BDF solver? BTW, out of curiosity I’ve reimplemented the nonlinear model from section 4 and it looks like this implementation using Stan’s regular rk45 is quite a bit (~30%) faster than the specialized pmx_solve_twocpt_rk45, is this to be expected? Also, don’t you guys have some kind of adaptive (stiff->non-stiff) and parallel (combining draws across chains for cov-estimation) warm-up. How much faster does this make the problem?
I haven’t used the adj solver, BDF on the other hand on a regular basis. It’d be great to see your implementation as I do notice that sometimes torsten’s version of rk45 performs differently for better or worse. We’ve discussed stiffness/nonstiffness detection solvers but in general the technique is both involved and not very robust. Torsten does have cross-chain implemented and we plan to discuss that in the coming part 2. Recently I did a benchmark for the ACoP12 conference and can post it here next month.
I would actually have thought that the BDF solver could work better for the linked nonlinear problem, as the stepsize of rk45 is bounded by the first compartment due to stability concerns and for the other compartments not much is really going on. But I tried to drop it in, and it just did nothing… 🤷
Hey guys cool paper! Looking forward to part two. I’ve been using Torsten every day as of late. I had a couple of questions.
Does Torsten currently support the adjoint solver in Stan?
Are the group integrator functions wrappers for parallelization? Is parallelization only over a cluster or is it possible to parallelize across cores in a multicore machine? I have access to a 48-core machine and my naive plan was to just wrapping the pmx_solve_bdf() function so that I can call it using Stan’s reduce_sum functions. Would that be possible in Torsten?
Hey @arya,
Sorry it took me so long to get back to you on this, this slipped through the cracks.
No (unless this was recently added…). As far as I can tell, the size of typical PKPD ODEs doesn’t warrant using the adjoint method but maybe you have some counterexamples in mind. If so, we can open an issue on GitHub. The algebraic solvers, on the other hand, all use adjoint methods, and that’s something you get in Torsten.
You can certainly use reduce_sum, but Torsten supports a population ODE solver, which handles the parallelization more seamlessly (see General ODE-based Population Model Function - Torsten). So my recommendation is to use the pmx_solver_group function.
The upcoming part II of our Torsten Tutorial should cover these topics (Part I just appeared in CPT: pharmacometrics)
@yizhang can probably provide additional insights.
Thanks @charlesm93! Yeah I’ve since played around with both pmx_solver_group and MPI and also just wrapping pmx_solve_twocpt in a reduce_sum. You’re right that the former technique is seamless in terms of cleaner Stan code, but the latter can be useful at times. For example, I ran into a bug that only shows up in the MPI version, but I made an issue on Github. Also the MPI installation can be a little daunting, although I do have a Bash script I can share that I use to install it now on fresh Metworx instances.