Finally A Way to Model Discrete Parameters in Stan

Well this is very interesting. I replicated the toy example from your paper, “Sparsity information and regularization in the horseshoe and other shrinkage priors”, Juho Piironen and Aki Vehtari. I believe this paper describes the state of art in Stan for feature selection in a sparsity problem.

To summarize the toy problem challenge for readers:

  • We have 100 data points.
  • Each point consists of a vector of 400 measurements or features.
  • 20 of these measurements are signal with a distribution of Normal(A,1) and the rest are noise with a distribution of Normal(0,1), for some value A.

The goal is to estimate a 400-element vector, “beta”, consisting of the means of the noise and signal measures. Hopefully, 20 of the estimated beta values will be A and 380 will be zero.

The “Sparsity” paper uses a horseshoe prior to estimate the betas and calculates the MSE of the estimated mean beta vector against actual for different values of A. From the “Sparsity” paper, MSEs for different A are (the “tau 0” line represents the horseshoe prior results):

I duplicated the above chart using a Rebar model. The code is shown in the files “sparse_01.R” and “sparse_02.stan” in the github: https://github.com/howardnewyork/rebar

The results using Rebar for the first six A values:

mse

This shows a reduction in MSE but two orders of magnitude! Given the magnitude of the improvement, I am concerned I missed something in the setup of the toy problem, but if not, these are very promising results.

Here for example, are the estimated beta values using horseshoe prior and Rebar

Horseshoe Prior: Estimated Beta from “Sparsity” Paper: A=6

sparsity_beta
(red = true, solid black is estimated beta)

Rebar Method: Estimated Beta using Rebar: A=6

A_6

The “Adjusted Beta” is beta .* one_hot. Rebar does better at identifying noise features and shrinking them to zero and not shrinking the signal features towards zero.

As can be seen, it looks like Rebar is a very promising tool for feature selection in Stan.

(With regard to how to choose a value of tau in Rebar, I do not have any particular scientific insight. I followed the heuristic of setting it to a small enough value to force the Rebar parameters to be close to zero or 1. Too small a value will slow down the MCMC and require higher adapt_delta and tree depth values, so you have to pick a reasonable value that works and the model is not too slow.)

2 Likes