A big stochastic model in Stan

I’m working with a complex stochastic model that and would like some guidelines on how to tackle it.

1. Model description

The model consists of a set of pdf’s with unknown parameters. The pdf’s are sampled in a controlled way, leading to a bunch of molecules. These molecules are then used to calculate some misture properties that can be compared to some data.

The goal is to sample the pdf’s parameters using stan.

2. Likelihood derivatives

Can HMC handle stochastic likelihoods?

3. External C++ code to calculate likelihood mean

I know there is some guidelines in the documentation, but i couldn’t get my head around it. (i’m not really a programming expert)

Is there a step by step procedure to do this? Or is there anyone that could help me?

I’ve written the model in C++ and is VERY BIG, would be very dificult to translate to the stan language.

appreciate any help.

Do you have a gradient coded in C++ for this very big model?

Actually no. I’m currently using an stochastic optimization algorithm (PSO) for the parameter estimation (MLE)

Hm, well, for HMC you need gradients so one way or another if you want to use HMC you’re going to need to code the gradient of your model.

Well, my model would calculate the mean of a known distribution, the normal for example, and i would let stan handle the gradients. My major concern was the numerical stability, since the mean itself might be random.

Sounds like you want to sample A using your big C++ model and then use Stan to generate B using P(B|A). That’s a procedure you could use and if P(A|B) and P(B|A) are two steps in a valid Gibbs sampler then you might get somewhere. You don’t describe the whole procedure so I can’t tell, but you can certainly use Stan as part of a Gibbs sampler (when you would want to I’m not sure but that’s ok).

Hm, here is some stan code to try to illustrate

data {
//here goes some data
}
parameters {
//some parameters
}
transformed parameters {
mean = f(parameters) //External c++ function
}
model {
y ~ normal(mean,sigma)
}

mean = f(parameters) // External c++ function

If f were written in the Stan language it would calculate both f and df/d(parameters), if your C++ version of f does that then you can proceed in Stan, if it doesn’t you can’t use HMC regardless of the programming environment.

Hm, thanks. I thought i didn’t need to define df/d(parameters). That should be a problem…

Anyway, could you guide me to some examples or tutorials about linking external functions to stan code?

@bgoodri wrote some doc for it but I can’t find it now. Maybe search discourse or wait for him to reply.

yeah, I think people miss the fact that the joy of the Stan language is that you write the posterior function and you get the posterior gradient w.r.t the parameters for free. It removes a huge source of bugs in the use of HMC.

https://cran.r-project.org/web/packages/rstan/vignettes/external.html

Thanks guys!!!

My major concern was the numerical stability, since the mean itself might be random.

This sounds like you’re in intractable likelihood territory.

f here should be deterministic. If that’s true, you should be okay, but if this is estimated with some Monte Carlo or something, that’s dubious. Even if the expectation exists, I doubt it’d be easy to get enough precision to keep the HMC happy.

transformed parameters {
mean = f(parameters) //External c++ function
}

But ignore this if your function is actually deterministics :D. Good luck!

Bbbales2. You’re actually right!!!. f is calculated through a monte carlo process. For my MLE estimation i need to use stochastic optmization algorithms. So, no HMC (or Stan) for me.

I was wondering if a gibbs sampler would do the trick… what do you guys think?

For Gibbs sampling you gotta know about your distribution as well.

I think as far as Bayesian inference goes for something like this, it’s ABC. Check out http://elfi.readthedocs.io/en/latest/

https://arxiv.org/abs/1708.00707 is the paper attached to this. Probably has some other stuff in the references you might find interesting.

@bbbales2

Well, i can evaluate the likelihood. What may happen is that for a specific point in the parameter space i have multiple likelihood values due to it’s stochastic nature.

Clearly this is very bad for derivatives, but i thought it might work in a derivative-free algorithm. ( i’m basically doing that when using MLE and particle swarm optimization). What do you think?

Thanks for the tip on ABC, sounds pretty interesting, i’ll definitely check it out.

What may happen is that for a specific point in the parameter space i have multiple likelihood values due to it’s stochastic nature.

Hmm, I don’t think this is the right way to think about this. Probability densities aren’t random.

If you need to know some probabilities to evaluate other probabilities, that’s the doubly intractable likelihood stuff (mlg.eng.cam.ac.uk/zoubin/papers/doubly_intractable.pdf) (which is something ABC is suited for).

2 Likes

How do I sticky this?

While we don’t have a way for you to do that built into Stan (there’s now ay to do randomization during the likelihood calculation in Stan), you can do it in an outside function. We use techniques like this for ADVI and in the max marginal mode optimization we’re doing, too. But you’d have to build it on the outside. If you use a mean of a Monte Carlo process, then you can calculate the gradients piecewise.

It is a lot of work, though!

No, but they can require randomization to evaluate. For example, they may be stated in terms of an integral that requires some randomization in the form of (MC)MC methods to evalute in practice. This is what happens when we marginalize.

p(a) = INTEGRAL_B p(a, b) d.b

To evaluate p(a), we have to compute an integral. To evaluate the integral, we have to use Monte Carlo methods. That makes our evaluation of p(a) itself stochastic, even though the density itself isn’t random.

1 Like

Right, but in that case the joint density that you’re specifying still isn’t random, it’s just the implementation of the marginalization that’s random due to the use of a stochastic algorithm like MCMC. I think that @benbales’s point is the one that we should be reinforcing.