Ito process as numerical solution of stochastic differential equation

Just pushed the first cut of an Euler scheme for SDE

Unit test is based on simple model dX = \mu X dt + \sigma X dW.
sde_euler_1d.pdf (8.7 KB)

The proposed signature of the function ito_process_euler is

matrix ito_process_euler(F1 drift, F2 diffusion, vector y0, 
                         matrix wiener_normal, 
                         real[] theta1, real[] theta2, real t)

where F1 and F2 are functions for drift and diffusion with theta1 and theta2 being their parameters, respectively. y0 is initial condition at t=0, wiener_normal is a matrix of iid standard normals for Weiner process steps. With t being the final time, the step size equals to t divided by the number of columns of wiener_normal.


The 2nd example/test is based on a stochastic SIR model, with parameter vector \theta=(a,b,c,d), the SDE is characterized by

f_{\text{drift}}(x) = (-ax_1x_2 + d, ax_1x_2 -(c+d)x_2, cx_2-dx_3)^t
f_{\text{diffusion}}(x) = (-bx_1x_2 , bx_1x_2, 0)^t

numerical results from ito_process_euler below show Ito processes that range from stable to unstable.

1 Like

I don’t know anything about SDEs, so I’m not going to be much help here.

Is there an issue defining what you want the behavior to be? Any chance we can get @charlesm93 or @bbbales2 or @wds15 to help review the code? I’m also looping in @syclik because he’s in charge of the math lib and also has a strong enough math background for this kind of work. @betanalpha would be another possibility for someone with enough C++ and math to handle this.

I’m about to submit an issue but since I’m getting zero feedback from discourse Performance of stochastic volatility model I’m hesitating to submit a PR. Currently my main concern is

  • This implementation could be completely written in Stan lang, is it still worth to add it as a function.
  • The Wiener process generation asks for a potentially large matrix/vector of iid normals, how’s this going to affect the performance in MCMC?

Everyone’s overwhelmed by volume and providing only spotty attention. You’ll need to tap some specific people to review this, and I think we only have a few core devs who are qualified.

The place to start is with an issue, not with a PR. Then when there’s agreement the issue is something we want to implement, then create the PR. @syclik is in the charge of the math lib and the one to contact if the process gets stuck.

That’s true of almost all the functions we write. By adding to Stan, we get something much easier to use and hopefully more reliable and efficient. But it’s a judgement call, which is why I’m suggesting starting with an issue and with pinging @syclik to review the intent.

You mean part of the implementation requires generating iid normals? We’d have to work out how to pass in an RNG in that case. They’re only in scope in transformed data and generated quantities as things are now.

Issue here

I’ve considered this option but decided to let user pass in a real array that is supposed to be of iid normal distribution.

Can that be reused? One way to generate iid normals if they’re fixed is to use transformed data—that allows an RNG.

If one’s fitting a SDE strong solution with given trajectory this iid normal array is going to be parameter.

The 3rd example solves a stochastic volatility model, in particular, the Heston model with truncated volatility

dy_1 = \mu y_1dt + \sqrt{\max(y_2,0)}y_1dW_1 \\ dy_2 = \kappa(\theta - \max(y_2,0)) dt + \xi\sqrt{\max(y_2,0)}dW_2

This model can be seen as an extension of the volatility model in Stan’s user guide. Unlike previous two examples, here we have two correlated Wiener processes dW_1 & dW_2 driving the evolution, therefore the return of f_2 functor should be a 2-by-2 matrix(in this model it’s diagonal).

With asset spot price 100.0 and volatility 0.02, figures below show the solution processes in 1-year period, with 1-day time step. The two driving Wiener processes have correlation -0.7 for leverage effect.

1 Like

@Charles_Driver, this looks like something that you’d know about. If you have an interest in chiming in, please do :D.

I’m not familiar with SDE solvers, so I can’t review a PR on the subject, at least not in a short time span. I suggest bringing these results up during the Stan meeting and maybe adding “reviewer for SDE solver” as an item on the agenda.

What are some motivating problems for this feature? Anything in pharmacometrics I might be familiar with?

@bbbales2 Ok I can try :) it seems nice to have to me, though two points confuse me a little:

Since the output per sample seems to be stochastic, how does this work with an mcmc type sampling procedure wrapped around it? I’m more familiar with the situation where we integrate over the stochastic nature of the SDE to obtain expectations and covariances, so in a sense we understand the complete picture of the SDE implied by the parameters given the sampled values of the parameters.

What’s the point of passing in the Wiener random noise elements, does this add some flexibility? It seems like an unnecessary specification for the user.

1 Like

You can do a similar implementation of an integrated ou process. In a pure c++ version the iid normals are a detail much like the internal parameterization of the simplex type. When implementing in Stan code you have to pass in the iid normals just like you do when you implement the non centered parameterization in Stan

The output X per sample is the same as the other parameters: the sample of (an array of) random variables. When the iid normal for the wiener process is parameter, per sample we get one path of the wiener process, then use it to solve the SDE.

This is along the line of weak approximation, as one seek \tilde{X} to approximate X in the sense of reaching small error ||E(f(\tilde{X}))-E(f({X}))||. For this purpose \tilde{X} doesn’t have to solve SDE with the given wiener process but just some process. This is different from what’s proposed here: we seek a strong approximation for a given wiener process.

As @sakrejda points out this is just to present a finite points(of size n) of a wiener process W_n as a reparameterization as an array of iid normals N_n, as the W_n=LN_n, with L being a lower triangular matrix filled with square root time steps. As @Bob_Carpenter mentioned that when the wiener process is data we can do this as RNG. However, we need wiener process to be parameters when we want to fit a given Ito process path. Also it does add flexibility, for example, user can specify the relationship of multiple wiener processes: in the 3rd example above, W_1 and W_2 are correlated to reflect that asset price often falls when volatility rises. In Stan lang this can be achieved by passing in a matrix N_n\in R^{2\times n} created by multiplying the cholesky factor to an R^{2\times n} matrix of iid normals.

Yes ok all makes more sense if you’re talking about the case where the Wiener process is made up of parameters. Some discussion above re random number generation had me thinking that wasn’t the general intention…

There has definitely been non-zero interest in SDE models over the past few years, with applications spanning physics and biology and epidemiology amongst others, so having a self-contained implementation isn’t a bad idea.

The Euler-Mayorama discretization is very simple but for a stochastic process without much additional structure (like the structure one would get in a Langevin process) it’s the typical tool for which people seem to reach in practice.

One potential issue is with the interface of the function – as most of us don’t use SDEs we’re not particularly well suited to comment on the interface design. In particular should it return a single realization, an ensemble of realizations, moments of the realizations, etc, and then in what format should those outputs be given. I think it would help to have some documentation that more carefully defines the output of the function along with the proposed implementation and a few hypothetical Stan programs using that implementation to help us get a handle on the appropriateness of the interface. Comments from users would also be super helpful.

Finally the biggest concern is the differentiability of the function, which is somewhat related to the interface question above. In general a single realization of a stochastic process won’t be differentiable with respect to the input parameters, hence an ensemble of realizations won’t be either. Conditioning on the stochasticity (i.e. passing in precomputed PRNGs) offers one workaround, as the conditional paths might then be smoothly related to the input parameters. As would returning moments when the discontinuous behavior averages out.

Ultimately because of lack of expertise on these topics within the typical dev community I think we would all benefit from a bit more doc in order to better understand the approach instead of having to deconvolve the method from the code itself. It would certainly help me verify the proposal.

Sounds like you are referring the pathwise sensitivies when calculating greeks in finance, and the rationale can be found, for example, section 2.

Good point, I’ll write more doc on this.

NONMEN had SDE + Kalman filter for a while, though I haven’t used it.

There are numerous approaches and seemingly even more numerous notations/terminologies. At the end of the day whatever solution the solver returns just has to be locally smooth with respect to the parameters.

Can you point me to some other notations/terminologies? Calculating greeks is the only context/application that I’m familiar with.