I’m having issues when I include t0 in my ts list of observation times fed into Stan’s ODE solvers, i.e. Chain 1 Exception: ode_rk45: initial time is 0, but must be less than 0.000000. These issues go away when I drop t0 from the list of observations. I take this to indicate that if I want to put an observation model around t0 for likelihood evaluation, I need to create an empty vector/matrix where I populate some_vector[2:] with the non t0 ODE solutions, and then add the t0 state back in at some_vector[1] to evaluate the likelihood with some_vector? I’m probably going to drop evaluation of the likelihood including t0 since it doesn’t matter that much for my research in the end and I don’t need to go through the extra container trouble, but just wanted to get some clarification.
What this error you are getting means is that you need to compute the solution starting at a time before the time of the first observation.
I also think this is a somewhat odd requirement of Stan, since that shouldn’t be strictly necessary, but maybe it has in mind the situation where initial conditions are fixed, and therefore the likelihood at that point would be meaningless for inference (conversely, if initial conditions are estimated it is equivalent to starting earlier).
The simple, straightforward solution is just to use a t0 argument that is earlier than your first observation (the actual time you choose is arbitrary, but computing the solution for times without observations is just wasted computation, though it may be useful to extrapolate a “prediction” outside the range of observations, with all the limitation this has).
Yeah, I am using Stan as a check on results from a customized inference method I’m experimenting with in which x0 is sampled from a density. For various reasons relating to the empirical process I’m modeling, I can’t shift t0 earlier.
I think I could cobble together a means of sampling from an x0 density, then initializing the ODE solver with the x0 sample, and then incorporating the x0 likelihood into the likelihood evaluation block, but handling that is less straightforward than with the customized inference method I’m testing where x0 uncertainty is built in.
Since the managers of the project won’t be too concerned about establishing uncertainty for the initial condition in the end and just want a general benchmarking of the custom inference method, I think I will just also fix x0 in the custom implementation and condition the latent state posteriors starting after t0 out of ease.
Since the shift is arbitrary, you can also make it arbitrarily small, so the difference between x_t at t0 and at the first time point is essentially negligible.
If you are estimating x0 as a parameter of the Stan model you are indeed sampling it from a density, but instead of it being an arbitrary fixed density it is from the posterior (if you have enough knowledge of the system that your fixed density closely resembles the posterior, they’d be approximately the same). The prior and constraints on the range of those parameters could ensure that you end up sampling from a more constrained density, like the fixed distribution in your custom method.
I don’t know the details of your method, but Bayesian inference is general enough that it should be possible to set up a comparable inference framework (although there are indeed exceptions).
Thanks for following up. This is just for curiosity and personal knowledge at this point, since I’m proceeding with fixed x0 in both Stan and the custom VI scheme I’m testing. I’m getting too close to deadlines and don’t want to risk introducing bugs with unnecessary modifications that are not required in the project scope. However, I’m curious about how things could be handled in Stan pseudocode.
The custom VI scheme originally required a p(x_0) for log likelihood computation. An x_0 is sampled, then the system is solved to obtain an \hat{x}. We then observe \hat{y} from \hat{x}.
By the way, I understand it’s the practical approach, but I didn’t opt for the arbitrarily small difference approach between t_0 and the first t_{\textrm{obs}}, because even if numerically similar and negligible in difference, it would not be an exact comparison to the custom scheme where the x_0 prior and likelihood evaluation are defined and executed exactly at t_0. I would prefer to sacrifice t_0 conditioning to more easily keep things equal in implementation (except for the scheme itself of course for the A/B test). I am reluctant to implement an equivalent arbitrarily small t_{\textrm{obs}, 1} - t_0 difference for the custom VI code, as the code currently requires observations in regular time and it would be tedious to change things across all dependencies. Hence, pardon me if I’m coming across as lazily and illogically dogmatic in my effort to keep things equivalent in as simple a manner as possible!
So, attempting to replicate that approach exactly in Stan and assuming Gaussian densities for everything, would one do:
First sample xhat0 from xhat0 ~ N(x0, x0_sd). (This would have to happen in the transformed parameters block?)
Then compute and observe a yhat[2:] from the ODE solver initialized at xhat0, and then assign/observe a yhat0 as yhat[1] with something like yhat0 ~ N(x0, y0_sd).
Then evaluate and sum likelihood estimates x0 ~ N(xhat0, x0_sd) and y ~ N(yhat, obs_sd) — let’s say obs_sd is scarlar here and y0_sd = obs_sd — in the model block?
I’m curious about what the model block would look like under current best Stan practices, if you have spare time to share pseudocode for that.
Edits: Some corrections to rough notation that I typed very quickly.
You don’t actually have to rely on an approximation of the parameter, since for an earlier time point t_{-\delta} the solution will include x_0 (the specific time point can be extracted in the transformed parameters block), so the parameters will be identical, though the implementation will sil require the initial t to be earlier than the first observation. This won’t affect the regularity of intervals of observation, and that the solver starts at an earlier time point should be of no consequence to the inference process.
Sampling statements (~ or target+=) go in the model block, as far as I’m concerned.
I don’t think you need all this trouble, if you provide a real t0 value and an array t of times at observation it should automatically give you the solution at those points, from there the kind of observation ~ normal(solution, standard deviation) statement is indeed what you need – assuming a continuous variable and gaussian likelihood.
That’s generally it, you have the prior on x0 and likelihood of observations y. You’d have additional priors on any other parameters, and these ‘~’ sampling statements would go into the ‘model’ block. You can also have transformation of parameters into this block, but they won’t be stored in the traces, whereas if you add them into the transformed parameters they’ll be stored in the output – that’s essentially the difference between those two blocks.
Check out the case study for a general idea, though there’s a lot of variation on what the best practices must be for any particular model.