High dimensional interpolation of a high dimensional integral

I’m trying to figure out a clever way to deal with a nasty integral inside my posterior.

I’m building a population model which has a Poisson-like term in the posterior to account for the estimated number of objects that should appear in a survey.

i.e.

Pr(\psi,\phi_{1:N}, z_{1:N} \vert N, d_{1:N}, \text{obs}, I) \propto Pr(\psi \vert I) e^{-\bar{N}(\psi, \text{obs.})}\prod_{i=1}^{N} n(z_i, \psi) Pr(\phi \vert z, \psi) C^{\text{DV}}(z_i) Pr(d_i \vert\phi_i, z_i)

Here, \psi are hyper parameters, \phi are object parameters and

\bar{N}(\psi, \text{obs.}) = \int d\Omega_{\phi} \int dz\; n(z_i, \psi) Pr(\phi \vert z, \psi) C^{\text{DV}}(z_i) Pr(S \vert \phi, z, \text{obs}).

The details aren’t so important, but I could not figure out a nice way to do this integral (~ 6 dimensions) numerically inside Stan. I thought that perhaps I precompute the integral outside Stan and then do a ~6D interpolation inside Stan to estimate \bar{N} for each draw on the hyper parameters, but interpolation seems to be tricky in Stan. Additionally, I can imagine all kinds of problems with going outside of the interpolation boundaries because I define my \psi as being normally distributed.

Unfortunately, these integrals are far from analytic thanks to physics and taking care of selections effects (Pr(S \vert \phi, z, \text{obs})).

Anyone run across some clever ways to handle this type of problem before?

Thanks!

Hi @grburgess

When I do numerical integration inside Stan, I normally start by treating some random draws (uniform or standard normal) as data. Then inside the Stan program transform these into the parameters that need to be integrated over (for instance you can go from standard normal to mutivariate normal via Matt Trick, or from (0, 1) uniform to whatever distribution via inverse transform). Then just integrate as you would.

Stan does not have an inbuilt Poisson inverse CMF, but you could just define one in the parameter block. A warning: the Poisson is not 1-1 invertible and I’d guess you’ll get discontinuities in the likelihood with respect to the Poisson parameters, divergent transitions etc.

Make sense?

Maybe we are discussing different things. I have to do numerical integration for each draw in the chain.

But I may not get your idea.

High-dimensional interpolation doesn’t work for the same reason that high-dimension numerical integration doesn’t work, so your strategy is unfortunately doomed to failure. Fortunately, MCMC is pretty good at doing these integrals for us!

In problems like this you want to make the integration variables latent (nuisance/systematic) parameters with the appropriate prior distributions. Selection can then be handled as a part of the generative model.

See, for example, Maggie’s blog post https://maggielieu.com/2018/01/25/selection-function-in-stan/ and some of Will’s examples https://github.com/farr/GWDataAnalysisSummerSchool/blob/master/lecture3/lecture3.ipynb and https://github.com/farr/SelectionExample.

2 Likes

Will’s example seems to be the direction I want to go. The selection is deterministic, so it drops out of the posterior except in that my case, the “Nex” integral has many integration variables and my knowledge of recasting the integral into an ODE is limited.

The other approach is nice but won’t work because I need to know the number of unseen objects which is (appears) to be neglected in that case.

I had originally tried to do gaussian quadrature, but it was incredibly inaccurate with these many variables.

The ODE approach will work only in 1 dimension, as will the new numerical integrator that should be out in the next release.

If you can’t integrate the tail of the selection function then all you can do is approximate it with a latent, finite population of possible observations and counting the number of those latent individuals that fall above or below the observation threshold. For large enough populations this approach is expensive but reasonably accurate.

That’s what I thought… Thanks, physics…

So, if I understand you, I can sample object from the population distributions (just like I do for my forward simulations at this point), and sample up to the total number of objects that would be seen if there were no selection. Then look at the number that passes the selection and this will be close to the number I would expect. I can see that being really expensive.

Will Stan stay stable with this type of randomness being added to the log posterior?

It’s really more of the experiment’s fault, but there’s nothing productive about blaming the poor experiments.

Yes. The problem is that you have a model that depends on the expectation of some other model, and if you can’t compute that expectation then you can’t specify the model. So instead we have to define a bigger model, but then that introduces these nasty discrete elements. That said, the strategy of modeling a latent population has worked well in the past in all kinds of applications.

Incidentally, this is a special case of what statisticians call an “intractable likelihood problems” which are on the frontiers of what we can solve. So be happy that you even have some expensive strategies to try!

In most cases, yes. HMC is great because of its robustness to adding lots of new parameters.

I won’t be too hard on the experiment… I helped design it. Just hard on myself!

I’ve tried to implement this idea starting from Will’s example. I pretend I cannot integrate the population distribution convolved with the selection.

First I just integrate the population over z to get the total number of objects that should exist (N_{\text{possible}} = \int dz\; dN/dz (z)).

Then, the total number of objects should be a Poisson draw: N_{\text{tot} }\sim Possion(N_{\text{possible})}.
(I think I’m wrong about this part though.)

Then I used rejection sampling from a function implemented in the function block to draw a latent population of redshifts:

real drawz_rng(real N0, real, alpha, real, beta, real, ymax)
 {


   real y;
   real zsample;
   int accept;

   accept = 0;

   while (accept = 0){

   y = uniform_rng(0, ymax);
   zsample = uniform_rng(0., 10.);

   if y < dNdz(zsample, N0, alpha, beta){

       // accept the draw
       accept =1;
     }
   }

   return zsample;

 }

in the model block:

  for (i 1:Ntot){

    zsample ~ drawz(N0,alpha,beta, ymax);


// add error to sample

Then I count the number of zsamples that pass the threshold and normalize the posterior in the same way as the original example:



        if (zsample_obs < zth)
          {
            // update the selected expectation
            Nex += 1;
          }

}

  for (i in 1:nobs) {

    zobs[i] ~ lognormal(log(ztrue[i]), sigmaobs[i]);

    target += log(dNdz(ztrue[i], N0, alpha, beta));
  }
  /* And we normalise by exp(-<N>). */
  target += -Nex;

  /* Broad priors, consistent with true values.  */
  N0 ~ lognormal(log(10.0), 1.0);
  alpha ~ lognormal(log(2.0), 1.0);
  beta ~ lognormal(log(2.0), 1.0);

But, these random number generations for obtaining Nex would seem to break the differentiable-ness of the posterior… no?

I’m still confused on where to define certain quantities, but I’m just trying to get a simple example for now.

No, there shouldn’t be any random number generation in the code. You’re not trying to sample a selection but rather approximate the density of the selection. In order to maintain a differentiable density approximation you need to elevate any random numbers to parameters (and let Stan handle their variation) or sample them externally and pass them in as data.

This is where I am getting confused. The latent population is a collection of of object parameters. In this case, the redshifts of these objects. But the number of those objects is a function of the hyper parameters, so the number is changing on each draw.

So the latent parameters are an “array” of redshifts sampled from the “true” model. The size of this array is changing though.

I’m probably missing something fundamental, and maybe it is better to take this offline. But some kind of zero order example would be useful. I think working from Will’s example would be nice.