Bi-annual question on discrete parameters

So I recognize that discrete parameters have been asked for many times, and I’m fully convinced that most models would be much more efficiently computed by marginalization/Rao-Blackwellization and that many models that can’t be marginalized are intractable anyways. However, I’m seeing a bunch of other bayesian inference packages offer mixture type methods for this problem. For example, Turing.jl offers “Compositional inference” that mixes Gibbs samplers and NUTS to sample from both discrete and continuous variables, and PyMC3 lets you do something similar.

Given the quality of governance I’ve seen on this project, I’m assuming there’s a reason it’s not in Stan. As someone with a passable understanding of HMC and essentially no understanding of how Gibbs works, is there a reason I shouldn’t trust these methods?

8 Likes

Good question. I was going to try to respond, but you’d be better served with a response from someone who has thought more deeply about this particular issue than I have. Tagging @betanalpha and @Bob_Carpenter in case they have time to respond.

1 Like

I’m at a three letter agency about to make a decision where a million lives hang in the balance, and the decision hinges on an analysis blindly that uses Turing.jl’s compositional inference to sample missing counts in a poisson regression! Quickly, I swear to God I’m gonna push the inference button!

Okay, none of that is true, but bump? Am I allowed to do that here?

3 Likes

There’s 2 problems with adding discrete parameters:

  1. If you add them, there is a 100% chance that people will start using them even when they really should be marginalizing those parameters out.
  2. Adding features is hard! It takes dev time and effort that we could put into other features, or takes that time away from our studies (since most of us are academics or students). PyMC3 and Turing have an advantage here in that they’re written in much simpler programming languages (Python and Julia, rather than C++). Turing has another major advantage from working in Julia – it’s much more modular, because of Julia’s multiple dispatch features. Adding an extra sampler to Turing is as easy as making a new package, then loading it alongside Turing. As long as the package implements a couple of methods defined by the AbstractMCMC API, the Turing devs don’t have to do anything to support it. We can just leave the package on its own to mature before we accept or reject it (pun fully intended). We’re doing something like that with Annealed Importance Sampling right now – the package started off as its own thing, but now that we’ve seen it and like it we have plans to integrate it into Turing directly.
4 Likes

We are actually starting to build a Stan compatible sampler that includes discrete variables and conditionally continuous variables such that we can, for example, do nuts in gibbs and nuts in rjmcmc with Stan. We aren’t quite there yet but i would be interested in hearing about models that people would like to sample from and which involve discrete variables.

2 Likes

@s.maskell I think models with multiple changepoints would be something that people would find interesting/useful.

1 Like

@dmuck: are you imagining that the number of changepoints would be discrete but the other parameters of the model would be continuous? If so, that fits within the set of things we are thinking about.

@s.maskell yes, that’s right

@s.maskell @dmuck note that even if the locations of the changepoints are continuous and the number of changepoints is known, the likelihood is discontinuous in the locations of the changepoints (it jumps discontinuously whenever the location of a changepoint moves past the location of one of the data points) and therefore can be difficult to sample. Presumably in part for this reason, the SUG introduces changepoint models in a context where the locations of the points themselves are discrete:

2 Likes