In the middle of designing general SDE solvers, I wonder if anyone has experience with applying volatility models in practice. In particular, since the model samples for wiener process in every time step, the number of normal step increment parameters(h_t) reach \sim 10^3 easily when the modeled period is in the scale of year, assuming each step corresponds to a day. How does such a model fare in terms of performance? Does the performance meet practical expectation?

I do. I usually use particle MCMC. You donâ€™t have to hold onto all the state samples throughout time, so itâ€™s relaricely memory-efficient, but itâ€™s still generally slow and hard to tune. This approach also has problems with the large T situation, but for a different reason.

Thanks.

Care to elaborate?

Sure. Particle MCMC is similar to regular Meteopolis-Hastings targeting the marginal posterior (parameters but not states/volatilities). The difference is that the likelihood isnâ€™t available, so it approximates the likelihood with an unbiased estimate from a particle filter. So at every iteration of the MCMC sampler, you run a particle filter through your time series data. Running through the data is done recursively, so at any given time youâ€™re holding onto samples that approximate a â€śfiltering distributionâ€ť at one time point.

The more accurate the likelihood estimates, the better the mixing. This is where the tuning is difficult. More particles is generally the way to do that, but that also means more computation time at each iteration. You can also be clever about what kind of particle filter you use, and this will make a difference. Finally, the variance of the log likelihood estimate is at best O(T), so for longer time periods it gets tough.

Iâ€™d be interested in knowing how well an approach like this fits within Stan because itâ€™s decidedly not Hamiltonian-y. Not only do you not have gradients, you donâ€™t even have likelihoods. If thereâ€™s any interest, I do have quite a bit of relatively battle-tested c++ code for this sort of thing.

Just a warning that there are a whole lot of techniques called â€śparticle MCMCâ€ť ranging from sequential Monte Carlo (SMC) filtering type algorithms to ensemble methods like differential evolution. Iâ€™m not sure what kind of particle methods could proceed withot a likelihood. Do you have a reference?

For algorithms, Stan exposes the model class, which can provide log densities and derivatives. Given that Stan is based on writing log densities and providing derivatives, density-free or even derivative-free methods arenâ€™t particularly attractive. The only dimensionally scalable MCMC algorithm (in the computational sense of complexity to draw a sample) is HMC and it requires derivatives.

The treatment for the lack of likelihood here is just as treatment in an ODE model: a sophisticated ODE doesnâ€™t have close-form likelihood w.r.t its parameter, we simply resort to the *approximate likelihood* by numerical solution. Here given a sample path of the process, the approximated likelihood would be based on path data, numerical discretization, and the fact the driving process is Brownian.

Just a warning that there are a whole lot of techniques called â€śparticle MCMCâ€ť ranging from sequential Monte Carlo (SMC) filtering type algorithms to ensemble methods like differential evolution. Iâ€™m not sure what kind of particle methods could proceed withot a likelihood. Do you have a reference?

Iâ€™m not familiar with differential evolution, but I am referring to using SMC/particle filtering algorithms **within** a Metropolis-Hastings algorithm. I wrote some slides from earlier in the summer here, and thatâ€™s got some references in it. Algorithmically itâ€™s pretty simple: you just replace a â€śrealâ€ť likelihood in the acceptance ratio with an estimate of it, and itâ€™s still asymptotically exact in the same way regular MH is. Usually the big requirement is that the likelihood is nonnegative and unbiased.

For algorithms, Stan exposes the model class, which can provide log densities and derivatives. Given that Stan is based on writing log densities and providing derivatives, density-free or even derivative-free methods arenâ€™t particularly attractive. The only dimensionally scalable MCMC algorithm (in the computational sense of complexity to draw a sample) is HMC and it requires derivatives.

Understood. I have a particle filtering library that asks the user to specify densities, and lets the user choose which particle filter he/she wants to subclass. Each base class provides a method to give you approximate log-likelihoods, but if scalability is a requirement, then yes, that might be a problem right now.

The treatment for the lack of likelihood here is just as treatment in an ODE model: a sophisticated ODE doesnâ€™t have close-form likelihood w.r.t its parameter, we simply resort to the

approximate likelihoodby numerical solution. Here given a sample path of the process, the approximated likelihood would be based on path data, numerical discretization, and the fact the driving process is Brownian.

Do you have a reference I could take a look at? Iâ€™m not familiar with ODE models at all, really, but this sounds pretty interesting. So you do this approximation at every iteration of a MCMC algorithm?

The most important aspect of the method you are referencing is not that it uses particles but rather than it uses a pseudo-marginal updating scheme. To avoid confusion I recommend that you refer to the method as a psuedo-marginal scheme with particle-based proposals, which should avoid most of the confusion with other methods and make it easier for others to find relevant references.

From now on Iâ€™ll use both instead of guessing on one :) However, in general, particle MCMC isnâ€™t a subset of the pseudo-marginal approach.

I was abusing the term *approximated likelihood* while what I should say is â€śconditioning on numerical solutionâ€ť, so that with ODE parameter \theta and observed data y_{\text{obs}}, we have p(y_{\text{obs}}|\theta) = p(y_{\text{obs}}|(y_{\text{num}}|\theta)). In a sense one can read numerical integrator as a complicated regressor, and most often its solution y_{\text{num}} serves as an unbiased mean, as the example in https://www.sciencedirect.com/science/article/pii/S030439750800501X

Also I should be more careful using â€śclosed formâ€ť to indicate â€śanalytical formâ€ť, since in general â€śclosed formâ€ť indicates something similar to â€śtractableâ€ť.

Pretty cool, this looks like another example of the pseudo-marginal approach. It predates the paper that introduced it, so that explains why that name doesnâ€™t pop up a lot.

Coming back to Stan, off the top of my head there are only a few approaches that handle this sort of model with intractable likelihoods. 1.) sample the joint posterior with the currently available options, 2.) implement some sort of pseudo-marginal thing perhaps that handles particle MCMC as a special case, and 3.) Iâ€™ve heard approximate Bayesian computation has been used for this sort of thing. Regarding 1.) where are we at with HMC being able to handle discrete targets?

I think @betanalpha was just saying the choice of approximate vs. full likelihood is independent of the choice of SMC.