PDA in Stan?

I was wondering if anyone has been able to do probability density approximation work in Stan? I.e. models where the solution to the likelihood function is not known. I have some problems which require numeric integration in the likelihood so I don’t strictly need PDA, but I think there are some extensions of the model that would need PDA.

A quick summary of what I think I mean by PDA- you simulate from the random function out of your current proposal and use a density fitting algorithm to calculate the densities needed to get likelihoods.

Has this been done with Stan/ is this theoretically possible to implement yourself?

Not currently possible and the kernel needs to be deterministic to work with NUTS. @randommm has done some stuff on numerical integration and we have talked about implementing some stuff for empirically tilted likelihoods and things like that. But none of that is actually in Stan.

If you just need to integrate some function, then you may want have look at this post:

There I am abusing the ODE integrator to do numerical integration. Not the fastest, not the most elegant solution, but it works with Stan already now.

If you can generate those draws and fit the density outside of Stan (or even using Stan) in a first pass, then you can just plug the result into a second pass. Otherwise, Stan isn’t really set up to use approximate densities internally, nor are we sure how they’d interact with HMC.

People have been wanting to do this to simulate the cut function in BUGS—for example, fitting a pharmacometric (PK) model then pushing the posterior to the pharmacodynamic (PD) model, without allowing full Bayesian info flow in the reverse direction.

We should also be adding a numerical integrator to Stan soon.

I recently received a private email regarding numerical integration [so it wasn’t just @Bob_Carpenter; my monthly private email counts has increased \infty %: from 0 to 1, but with such a high overdispersion I’m unsure there’s a model to decide if it was caused by Discourse]. Since it’s a recurrent topic, I it’s worth sharing a summary of the points (was going to forward the email some days ago, but Discourse just messed up with the inline threads and I gave up) even if it might not be useful to the OP:

If you want to use the implementation I wrote (and you know how to compile a custom version of Stan for the interface you use) it’s in branch GitHub - stan-dev/stan at feature/integrate-function-parser-doc for stan-dev/stan and branch https://github.com/stan-dev/math/tree/feature/issue-210-numerical-integration for stan-dev/math. However there’s two problems with such implementation: first the parser doesn’t allow the function at all, and I don’t know the reason since I know very little about the parser. The function generator however works and therefore, all you need to do is force the parser to allow invalid code to proceed to the compiler. The code in the branch already does that for you, but you have to be careful because other invalid code that you mistakenly write will also pass. The second problem is lack of relative tolerance (only absolute tolerance).

On the other hand, one potential advantage that such implementation have is letting the user set the gradient function manually, I believe this might make a difference on some cases. I don’t about the difference between its algorithm and the ODE solver one, though (would love opinions of experts here).

However, I recently discovered, thanks to @wds15 that it’s possible to do one dimensional integration using the ODE solver of Stan and has since used it in a Python package I’m working with. Here’s the relevant part:

functions {
real lkk(real x, real beta) {
real result_;
result_ = 0;
for (i in 1:num_elements(beta)) {
if (i % 2 == 0)
result_ = result_ + beta[i] * cos(i * pi() * x);
else
result_ = result_ + beta[i] * sin((i + 1) * pi() * x);
}

return exp(result_ * sqrt2());

}
real integ(real x, real f, real beta, real x_r, int x_i) {
real dfdx[1];
dfdx[1] = lkk(x, beta);
return dfdx;
}
}
data {
int<lower=1> nmaxcomp;
}
parameters {
vector[nmaxcomp] beta1;
}
transformed parameters {
real lognormconst1;

lognormconst1 =
log(integrate_ode_rk45(integ, {0.0}, 0, {1.0},
to_array_1d(beta1),
x_r, x_i, 1.49e-08, 1.49e-08, 1e7)[1,1]);
}

If you want to use the implementation I wrote, then here’s an example:

functions {
real ftft(real x, vector param) {
return exp(x) + param[1]^2 + 2param[2];
}
real gradff(real x, vector param, int indic) {
if (indic==2)
return 2;
else
return 2
param[1];
}
}
parameters {
vector[2] p;
}
model {
p ~ normal(0, 1);

integrate_1d(ftft, 0, 1, p);
integrate_1d_grad(ftft, gradff, 0, 1, p);
}

You should be able to compile and use it with the master (not the develop!) branch of RStan and PyStan.

integrate_1d is very slow and should be used only for preliminary testing, after that you should stick with integrate_1d_grad (you can use test_grad of Stan to check if you set the gradient function correctly).

1 Like

I am curious… have you compared performance of the ODE integrator to your custom function?

BTW, another strategy to evaluate integrals is by fixing the number of support points a priori. Then you can do the integration in Stan also directly. The downside is that you won’t be able to control for the accuracy such you must ensure that your integration technique gives you reliable results for the parameter spaces of interest. This is dangerous, of course, as the sampler has to explore parameter spaces outside of that domain what you expect/makes sense - and its not clear what happens then. Still, your problem maybe well enough behaved to work out. I have posted already the tanh integration scheme as suggested by Ben with a brute force method in Stan (using only Stan functions) if I recall right.

Sebastian

I haven’t made any formal benchmarks, but they seem to attain similar results for my specific problem, however I’m unsure how this would play out in case of more complex gradients or when relative tolerance is needed (the algorithm lacks it, in its current form).

Also, will the relative/absolute tolerance one sets on the ODE solver be the actual tolerance for the numerical integration problem? (Maybe it’s a stupid question, but better sure in order to have a fair comparison).

I believe @aaronjg might have worked with the “another strategy to evaluate integrals is by fixing the number of support points a priori” and that have, at first, played out better than the ODE solver [but for yet another problem].

Yes - I was looking into this for the Ho approach for the Schwartz-Yeh approximation of sum of log-normal variables. I found that the ODE solver had similar performance but had very low accuracy in some regions of the sampling space, and quadrature was much more consistent. I plan to revisit this at some point and try @randommm’s function, but quadrature is working well enough for now.

Stan’s ODE solver had low accuracy? You should be able to control that through arguments. Is it not returning accurate results within specified tolerances? If you have a reproducible example for that, we’d like to see it in order to diagnose possible bugs in tolerance controls.

Let me clarify - the ODE solver worked fine in terms of returning things within the proper tolerances. However, the Schwartz-Yeh approximation requires recursive computation of integrals. In doing this the small errors in the ODE solver compounded into some very large errors whereas doing doing the integration by quadrature did not lead to this compounding issue.

I’m not sure what the root cause of the issue was. Hopefully I can dig into it and provide a minimal example at some point.