Try to lower your absolute tolerance of your problem allows that. Also try the Adam’s solver for the backward solve. That helped in a few of my problems.
I tried reducing both absolute and relative tolerance to 10^{-3}, and changed the solver to Adam’s solver as you suggested. I am still getting no sampling and the following error:
'CVODES: CVode At t = 80 and h = -1.73212e-151, the error test failed repeatedly or with |h| = hmin. Error code: -3'
I am starting to think that the problem might be too ill posed with the data I am trying to fit it with.
How about simulated data?
@wds15 Yes. I think that would also be a good step.
The difficulty of what I am trying to do is that I don’t know what is the underlying process. More specifically, I don’t know which variables affect which, and I would like to, through some variable selection process, choose those variables and interactions that best describe the process of each variable.
Now, in order to build such simulated data, I think I probably have to come up with a (sparse) matrix of coefficients, simulate the ode’s forward and then try to infer that matrix.
What do you know about the underlying process?
I know the class of functions that might have generated the process. Specifically, I know that the only possible function that could have generated any particular variable is the following:
\frac{dy_{i}}{dt}=\theta'y+\eta \hat{y} + \gamma x
Where \hat{y} are the interaction terms, and x are some “exogenous” variables. I also know that the process is sparse. So most of the elements of \theta, \eta, and \gamma are 0.
Hm, so just to clarify because I’m a little bit confused by the single index on the left hand side which does not appear anywhere else:
You have an N-dimensional (N=14 is fixed?) state vector y \in \mathbb{R}^N which (using Einstein’s summation convention) evolves according to the ODE
\dot{y}_i(t)=\theta_{ij}y_j(t)+\eta_{ijk}y_j(t)y_k(t)+\gamma_{i,\mu}x_\mu
where the indices i,j,k run from 1 to N, \mu runs from 1 to some other natural number M, \theta is a (sparse) constant NxN matrix, eta is a (sparse) constant NxNxN “tensor”, \gamma is a (sparse) constant NxM matrix and x is a (constant?) M-dimensional vector?
That is correct. I am sorry I was a bit too informal in my explanation. x is not necessarily constant, but we don’t care about the dynamics of it. In fact we can control x, and whenever we act on x there will be an (unknown) effect on y. We do know, though, that all the matrices (and tensor) of the dynamics are sparse. So it might be the case that only 3 or 4 \eta_{ijk} are non zero from all the possible 14^{2} parameters for each \dot{y}_{i}(t).
So x(t) is given/prescribed?
The ODE does sound simple enough, but fitting it seems quite difficult. Carrying along all potentially (effectively) zero parameters will be very expensive, but a priori you can also not just set anything to zero, or can you? Maybe you could start with just a few non-zero entries, and then incrementally add more? But this also becomes a combinatorial nightmare, doesn’t it…
There’s nothing else you know about the matrices and tensors? Symmetry, antisymmetry or positivity?
Yes, x(t) is prescribed. And yes, you hit the spot, if I were to try the problem with a few combinations at a time only, then it becomes a nightmare quite fast. I can’t set anything to zero a priori, unfortunately.
No additional information about the matrices and tensors. I do have more than one measurement. So for the system at hand I measured the process with different starting points and different x(t) 20 times.
Huh, how horrible.
For the ODE part you will have to somehow get the adjoint solver to work, but even then it will probably take forever…
I guess in principle it should be possible to adaptively remove non-zero entries, but none of the interfaces is equipped to handle this…
Thank you very much for your answer! (no sarcasm) It really tells me that maybe I am not looking in the right place, or that I have to somehow try to reduce the problem to several smaller problems.
If you are interested on how this problem has been tackled in the past, there is an interesting paper about it: “Learning stable and predictive structures in kinetic systems: Benefits of a causal approach” (https://arxiv.org/pdf/1810.11776.pdf). The thing is that they use a frequentist approach, and they do, in fact, try out several small ODEs. I wanted to tackle the problem from a Bayesian perspective.
I guess there is also the possibility of using a different inference method (such as VI) that might be less precise, but considerably faster. Then I would have to get out of the Stan world.
In any case, thank you very much for taking the time, discussing with me, and suggesting different approaches. I have learned a lot these last couple of days.
This might actually be the case. I think Stan also lacks support for sparse matrices, which would probably become crucial if you want to extend the method to e.g. metabolic networks as in the paper you linked.
Hm, Stan’s ADVI would not be applicable?
Well, I am looking for challenging models to fit. I guess your model would qualify ;)
I didn’t know ADVI was still supported. I am using pystan for which the support stopped in version 3.2.: Upgrading to Newer Releases — pystan 3.2.0 documentation . Maybe I should take a look into cmdstan.
With respect to challenging problems. The data and all the rest of the details are available here: https://learningbydoingcompetition.github.io/
Interesting. But the problem specification suffers from the same index confusion as your formulation above, doesn’t it?
I think equation 7 of the whitepaper is more or less clear. It has a formulation similar to yours, without taking into account the control variables (what I called x but in their case is U).
Well yes, it’s just that the index i again only appears on the left hand side, which of course wouldn’t make a lot of sense.
Just digging this thread up to tell you that this is not correct, which might help you solve the problem (though probably not using Stan).
In particular, due the ODE being derived from mass-action kinetics, the following holds:
- \theta_{i,j} \ge 0 if i \ne j
- \eta_{i,j,k} \ge 0 if i \ne j and i \ne k.
Furthermore, if you do not allow species to “magically” introduce mass into the system you have at least
- \theta_{i.i} \le 0,
but I think magic is allowed.
@Funko_Unko Thank you for the information! Are you participating too?
Hm, I don’t know. Looks like I would need some serious computational resources, but currently I’m restricted to a sad little notebook.
The problem is of course massively scalable, both the fitting with 12x20 independent ODEs and then the optimization with 12x50 independent ODEs.