Hello, I have a quick question. I need to fit 25 parameters within a set of PDEs. (Partial Differential Equation).
The way to fit is abstracted as follow:

Random sampling the 25 parameters
Plug into the PDEs then do numerical analysis(iteration)
Calculate a set of result which can be considered as probabilities of binomial distribution
Calculate the likelihood along with the collected data

I’m asking if it’s possible to use STAN and the HMC algo to sample the 25 parameters till the likelihood reaches the maximum (MLE) ?
Since the PDE solving(also log-posterior) includes iterative numerical methods, I don’t know if the STAN auto-differentiation algorithm will work or not.

Because the log-posterior includes a bunch of codes, I come to ask if it’s doable before I start to translate the original MATLAB code into the STAN’s functions.

I’m asking if it’s possible to use STAN and the HMC algo to sample the 25 parameters till the likelihood reaches the maximum (MLE)

So HMC is gonna generate samples from your posterior. If it’s an MLE you want, then you can just run an optimizer on that.

Since the PDE solving(also log-posterior) includes iterative numerical methods, I don’t know if the STAN auto-differentiation algorithm will work or not.

It’s possible it can, but the autodiff for something like this is probably a little slow. If you want to generate a couple thousand samples from your posterior and NUTS needs to take 32 HMC steps for each one (which is pretty efficient sampling), then you’re looking at 64000 model evaluations/gradient calculations. With a PDE or something you probably want to think about custom autodiff.

That said there are numerous pretty complicated functions in the Stan math library that are autodiffed. The eigenvalue solver is one.

Random sampling the 25 parameters

Hmm, Stan doesn’t really directly draw samples from it’s posterior distribution. It’s less of a guess-and-check process and more of an exploring process. Check out the Michael Betancourt or Radford Neal Intro HMC papers. There are some cool animations here: https://chi-feng.github.io/mcmc-demo/app.html#HamiltonianMC,standard

Plug into the PDEs then do numerical analysis(iteration)

Is this like an iterative sparse solver?

Calculate a set of result which can be considered as probabilities of binomial distribution

The results of the PDE are a distribution? Or get fed into a distribution as data? Can you write out what a hypothetical Stan model would look like? Do you have one that has maybe just a few parameters – 3-4 maybe instead of 25?

For inverse problem like this, for each HMC step the computing effort is in the inversion of the Hessian of the Tikhonov functional. If your parameter-to-observable mapping is linear and using a Gussian prior as well as likelihood, this is equivalent to the posterior cov matrix. I think stan can handle this. On the other hand, I’d be concerned about the amount of computing. Since you opt an iterative solver, I assume the linear system from PDE discretization is large. This could make it impractical to use MCMC. Another possible issue is the convergence tolerance of your Krylov solver, as if the solution is not accurate enough the number of effective samples can decrease.

So maybe you can profile it by hard coding a low dimensional matrix inversion in stan and examining its performance.

This is a problem where parallel could really help.

As Ben said, you need to use optimization, not sampling. Sampling finds typical parameter values, not ones that maximize the log density.

Yes, this has worked where we tried it before, like differentiating a 1D integrator or an ODE solver.

The only concern is that differentiating the iterative algorithm (which is an approximation to the value) may not give a good approximation to the derivative everywhere.

Sorry, I just wanna ask, I remember the non-formative prior(uniform) will result in the same result as MLE, is that correct? I’m not sure the difference between optimization and sampling.

Thanks, I will try as long as there’s some possibility that Stan may work.

Computation intensity is not my major concern now. I will try hundreds samples first to see the time cost. Adaptive MH usually takes 1.5h to generate 2500 samples on my laptop. LOL

Sorry for the incorrect word, I have read the paper about HMC, and the animation is amazing!!!

I just learned this term, I’ll post the details about the model I’m working on with respect to sparse systems iterative solver later.

Actually the PDE solver will give a vector, and by some linear transformation, the vector’s elements could be deemed as the probability of binomial distribution. The form is as below:
X~binomial(N,p)
Based on the collect data of X, N (both are vectors of same size), we calculate the likelihood along with the p(vector) from the PDE solver.
Thus, the goal is to sample the parameters to derive their mean, sd with respect to the maxim likelihood.
Also, it’s highly possible there might be some sloppy parameters.

I suspect iterative solver is going to generate a large expression graph, which poses a serious stress to memory. Would love to be proved wrong though.

You’ll probably be better off doing your gradients manually (instead of them being autodiffed) for something like this.

This paper: https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf shows how to derive forward and reverse mode autodiff eqs. for a bunch of linear algebra stuff. For sparse matrices it’d just be a matter of what algos you use to compute stuff.

This is the paper that describes how the Stan Math library works: https://arxiv.org/pdf/1509.07164.pdf (which’d explain what it means to implement custom gradients for an operation)

Also if you’re doing linear algebra in Stan and you want your stuff to be easily used, that more or less means you’re using Eigen. There’s ways to link in other libraries, but it gets hacky real quick.

We’ve autodiffed through iterative solvers before. The first version of the ODE solver was done this way with me, @bob-carpenter, and @wds15. If I recall correctly, the bottleneck was the speed, not the memory. (Yes, they go hand-in-hand, but what I mean is that the amount of memory taken wasn’t ever the issue.)

in principle there is no memory issue with autodiffing through the solver, BUT sometimes stan tries out hard parameter configurations which lead to excessive iterations. This triggers huge amounts of memory to be allocated as the AD is rigorously following all those terms which are added up. This was the reason for a number of crashes on our cluster as the memory got exhausted for those “bad” chains which explored very stiff parameter ranges. Back then we did not have a max_num_steps option which would probably solve this issue.