Speeding a bioenergetic ODE model

Dear all,
Thanks for your further comments. For me is ok to keep the topic open. Hope that this may help other stan users…and hope this may help to reduce computation time in my specific case!

Some comments:

  1. I am happy to report that a new run with 30 fish takes “only” 5.5 hours. The settings are not exactly the same (for example, I halved the number of iterations; 500+500). In spite of 4 divergent transitions, accuracy and precision was very good. I am adding the fish level comparison between the true (I am using simulated data) and the estimated parameters. Between-fish means and variances, and measurement errors were very well estimated too.

As suggested by you, perhaps one of the bottlenecks is estimating between-fish variances, and increasing the number of fish is helping.

  1. Pairs plots

Pair plot for the global paramters are fine. The blue lines are the true values.
imagen

HOWEVER, pairs plot at fish level are not so clean (posteriors of v and kappa are correlated).

imagen

I systematically compare priors and posteriors. I expanded the priors when are too close to the posteriors. But priors are informative. Remember that all this topic is a quality control for the code. Thus, provided that I know the true values (=simulated data), I can play in favor of the code, at least till solving the technicalities. Related with that,

the fish level initial values are always set to the mean values, thus there are no differences in initialization.

  1. The zeroes problem possibly deserves special atention: Animals start to reproduce after puberty. In my case, most of the fish are mature in the second year, but a few fish are mature from the first year or from the third year. The bioenergetic (DEB) model deals with that. Fecundity of a juvenile is zero eggs. I referred these zeros as structural. According to the DEB model, puberty depend on a state variable. When this state variable (an energy level) reaches a threshold, the fish matures and starts to reproduce. The inequality (or the alternative step function suggested by some of you) in the ODE equations is the gate between juvenile and adult. As I commented in some place above, I am not happy with the trick of adding a small constant for being able to assume a log-normal error distribution for fecundity. Any alternative will be very welcome. I cannot see how to consider fecundity as censored variable. Note that the age at maturity is fish specific and unknown. I neither can ignore these zeros when summing likelihood.

Again, I am very grateful with your suggestions.

2 Likes

Yeah, I thought about this as well and have so far not come up with a satisfying solution… With the trick of adding small constants you have a locally flat contribution of the fecundity measurements to the log likelihood for years for which the simulated fish is not mature yet, irrespective of whether you observed eggs in this year or not. But really you want to leave this parameter regime as quickly as possible! I guess you’re still leaving this regime due to the other measurements, but I think ideally you would want to introduce a very steep gradient away from the current fish parameters if the simulated fish is not mature yet, but eggs were observed. But how to do this smoothly…

(The other direction, mature simulated fish without observed eggs, is less problematic)

I have some ideas, most would probably not work :-) in any case I would test all of those in a single fish model as it is highly likely that if the modifications would help in the hierarchical model they would also help for a single fish and testing single fish is going to be much faster (I would run all tests a few times with different fish though to avoid being misled by specifics of a single fish) Here “help” doesn’t necessarily mean shorter runtime, but could also be higher ESS (ESS/second is usually a better measure of speed than just time) which should be accompanied by lower average treedepth (via nuts_params).

This looks important. First: if what you actually observe is egg counts a discrete distribution like negative binomial might be a better fit than lognormal and handle zeroes seamlessly whilemaintaining that the logarithmofthemean is proportional to the latent value.

Second (much more speculatively) if the DEB model ensures that all fish eventually reach puberty and then never cease to be fecund (not obvious- maybe it doesn’t?), you may actually be able to solve for “time of puberty” (analytically or numerically) and then solve the ODE in two parts (fecund/non fecund) and thus avoid the step function.

That might suggest a possible reparametrization of the ODE. I don’t understand it well enough, but do you see why increasing both v and kappa could lead to a very similar likelihood (presumably because of similar trajectory of the system)? It can generally be illuminating to just play with the system and see how the trajectories of the state variables respond to changes in your parameters. When very different param values do not result in very different trajectory, you often have a problem.

It would also be good to check the plot with “local” and "global " params at the same time (e.g. all 6 params you’ve shown in a single plot).

Sorry I missed/forgot that you are already using simulated data, agree that in that case prior x data conflict is almost certainly not an issue.

Alright so I’ve found the time to fiddle with the model. First, my results for fitting the provided posterior:

Using 6 parallel chains with 6 cores I get on my local notebook within 5m+15s (warmup+sampling) a minimal N_Eff of ~147 with iter_sampling=84. Diagnostics are fine (max(rhat) < 1.01, E-BFMI=.9, no divergences). During sampling the mean number of leapfrog steps is 16.

I believe the main performance gain is due to a custom warm-up procedure which not only generally performs much better than Stan’s regular warm-up, but crucially allows me to continuously adapt the “centeredness” of the parametrization, which leads to very efficient sampling. Sadly, this procedure is not yet ready for public consumption and also currently I don’t have the time to make it so.

Still, I have some other comments:

  • Change your model such that positivity of all physically positive parameters is ensured. This includes the fishwise pAm_i and v_i parameters, which currently may become negative.
  • Similarly, take care that the ODE solution only yields physically reasonable (positive) values.
  • Change your ODE such that instead of states {E,V,H,UR} you have states {E,V,H+UR}. H and UR can be recovered from H+UR via H=min(Hp,H+UR) etc.
  • Think about whether this fixed threshold Hp really makes sense (I don’t believe it does, even if it were a fishwise unknown parameter).
  • Zero-fecundity observations can be handled via something like 0~bernoulli_logit(H+UR-Hp). This will encourage simulated fish to stay immature. Personally, I would change the model to get rid of the fixed threshold and incorporate something like the above.
  • Zero-fecundity simulations can also be handled using H+UR-Hp, appropriately mapped from R to R+.
  • Finally, it looks like you will have to play with the centeredness of your parametrization.

Edit: Updating to add that for 32 fish everything took 12m+1m.

Wow! This is the range of computation time In am looking for playing with the model details.

I noticed that the first warmup iterations were very slow, but after a while the speed increases. I suppose I’ll have to settle for the custom warmup.

I will work with all the other comments you are suggesting.

Thank you for your time.
m

No, thank you for the great example!

I would be very interested (though probably of limited actual use, assuming this is written in C++ rather than Stan or R/Python) in helping you to get a publicly consumable version of this off the ground.

1 Like

Thanks, most of the code is actually Python, with just a little C+±addition to Cmd/Stan which is not strictly necessary.

I’ll come back to your offer!