I’m working with a complex stochastic model that and would like some guidelines on how to tackle it.
1. Model description
The model consists of a set of pdf’s with unknown parameters. The pdf’s are sampled in a controlled way, leading to a bunch of molecules. These molecules are then used to calculate some misture properties that can be compared to some data.
The goal is to sample the pdf’s parameters using stan.
2. Likelihood derivatives
Can HMC handle stochastic likelihoods?
3. External C++ code to calculate likelihood mean
I know there is some guidelines in the documentation, but i couldn’t get my head around it. (i’m not really a programming expert)
Is there a step by step procedure to do this? Or is there anyone that could help me?
I’ve written the model in C++ and is VERY BIG, would be very dificult to translate to the stan language.
Well, my model would calculate the mean of a known distribution, the normal for example, and i would let stan handle the gradients. My major concern was the numerical stability, since the mean itself might be random.
Sounds like you want to sample A using your big C++ model and then use Stan to generate B using P(B|A). That’s a procedure you could use and if P(A|B) and P(B|A) are two steps in a valid Gibbs sampler then you might get somewhere. You don’t describe the whole procedure so I can’t tell, but you can certainly use Stan as part of a Gibbs sampler (when you would want to I’m not sure but that’s ok).
data {
//here goes some data
}
parameters {
//some parameters
}
transformed parameters {
mean = f(parameters) //External c++ function
}
model {
y ~ normal(mean,sigma)
}
If f were written in the Stan language it would calculate both f and df/d(parameters), if your C++ version of f does that then you can proceed in Stan, if it doesn’t you can’t use HMC regardless of the programming environment.
@bgoodri wrote some doc for it but I can’t find it now. Maybe search discourse or wait for him to reply.
yeah, I think people miss the fact that the joy of the Stan language is that you write the posterior function and you get the posterior gradient w.r.t the parameters for free. It removes a huge source of bugs in the use of HMC.
My major concern was the numerical stability, since the mean itself might be random.
This sounds like you’re in intractable likelihood territory.
f here should be deterministic. If that’s true, you should be okay, but if this is estimated with some Monte Carlo or something, that’s dubious. Even if the expectation exists, I doubt it’d be easy to get enough precision to keep the HMC happy.
transformed parameters {
mean = f(parameters) //External c++ function
}
But ignore this if your function is actually deterministics :D. Good luck!
Bbbales2. You’re actually right!!!. f is calculated through a monte carlo process. For my MLE estimation i need to use stochastic optmization algorithms. So, no HMC (or Stan) for me.
I was wondering if a gibbs sampler would do the trick… what do you guys think?
Well, i can evaluate the likelihood. What may happen is that for a specific point in the parameter space i have multiple likelihood values due to it’s stochastic nature.
Clearly this is very bad for derivatives, but i thought it might work in a derivative-free algorithm. ( i’m basically doing that when using MLE and particle swarm optimization). What do you think?
Thanks for the tip on ABC, sounds pretty interesting, i’ll definitely check it out.
While we don’t have a way for you to do that built into Stan (there’s now ay to do randomization during the likelihood calculation in Stan), you can do it in an outside function. We use techniques like this for ADVI and in the max marginal mode optimization we’re doing, too. But you’d have to build it on the outside. If you use a mean of a Monte Carlo process, then you can calculate the gradients piecewise.
It is a lot of work, though!
No, but they can require randomization to evaluate. For example, they may be stated in terms of an integral that requires some randomization in the form of (MC)MC methods to evalute in practice. This is what happens when we marginalize.
p(a) = INTEGRAL_B p(a, b) d.b
To evaluate p(a), we have to compute an integral. To evaluate the integral, we have to use Monte Carlo methods. That makes our evaluation of p(a) itself stochastic, even though the density itself isn’t random.
Right, but in that case the joint density that you’re specifying still isn’t random, it’s just the implementation of the marginalization that’s random due to the use of a stochastic algorithm like MCMC. I think that @benbales’s point is the one that we should be reinforcing.