How to model a choice between 2 distributions

The traditional way of doing this is to marginalize out the choice of model parameter. See the Stan manual section on “Change Point Models”.

I have coincidentally just written an alternative method to produce an approximate one-hot encoding solution that uses the Rebar distribution. See Finally A Way to Model Discrete Parameters in Stan and https://github.com/howardnewyork/rebar/blob/master/README.md

I do not think stan has a triangular distribution available so that would have to be manually coded.

If you define X to be a 70 element vector having a Rebar distribution (see github reference),where the i’th element is 1 for the signal feature and all other elements are zero for the noise, then you can write the likelihood statement as:

for (d in 1:D) { // loop within vector, D = dimension of the vector
target += ((1-X[d]) * normal_lpdf(y[i,d] | mu[1],sigma[1]) + X[d] * normal_lpdf(y[i, d] | mu[2], sigma[2])) * (y[i, d] == 0 ? 0 : 1);
}

The last part in brackets excludes zero valued data.

You can adjust the code to select two different families for noise and signal rather than just the Normal for both, but be careful to provide some structure, e.g. by setting the mean of noise distribution to be higher for one option, or use informative priors so as to avoid label switching errors. When I ran my code, it was still somewhat susceptible to label switching, so running a single chain rather than multiple chains is safer. I am not quite sure how to completely avoid this.

I also included optional code to use the standard marginalization approach. This approach gave unsatisfactory results. So it either just does not work very well or there is an error in the my alternative marginalization option code.

Hope this helps.

noise_and_signal.R (1.6 KB)
noise_and_signal_2.stan (2.0 KB)

1 Like