Inspired by a talk at NIPS yesterday on including discrete parameters in a neural network, while still allowing for calculation of derivatives with back propagation, I set out to see if I could model discrete parameters in Stan using the same approach.

The method uses the CONCRETE (“CONtinuous RElaxation of discreTE random variables”) distribution approach (see http://www.stats.ox.ac.uk/~cmaddis/pubs/concrete.pdf ) and REBAR, an improved version (get it?) found here: https://arxiv.org/pdf/1703.07370.pdf

The distributions produce a variable that comes arbitrarily close to a one-hot encoding of a discrete parameter.

The method avoids having to marginalize over a hidden category allocation probability, as it the current recommended approach when dealing with discrete parameters in a graph.

As an example, if you are modeling a distribution with two modes, mu[1] and mu[2], where mu = [mu[1], mu[2]], and X is a N x 2 matrix where each row is a one-hot encoding of the unknown category, then it would be nice to model this in Stan using the line:

y ~ normal( X * mu, sigma);

Modeling each row of X as a CONCRETE or REBAR allows this model to be constructed without the need to resort to marginalizing over the alpha parameter.

I produced a toy example with Stan and R code which can be found here:

Please let me know if you find this useful, if you have any ideas to enhance it, or have useful applications. This method produces a neat and more intuitive alternative to the current marginalization approach.