Finally A Way to Model Discrete Parameters in Stan

Great, thanks. This is very helpful. So if I understand it then, the toy problem estimates MSE based on a single 400 element data sample, and then averages over 100 such MSEs, as opposed to my method of one calculated MSE based on a single model with 100 vectors (i.e. it is the average over 100 models with one data point each, rather than one model with 100 data points).

So to close the loop, I tried this approach and calculated the MSE for a single data point and it seems as if the MSE using Rebar is similar to the horseshoe results. (I would need to leave the model running a long time to do all 100 runs - perhaps another day for this). I used a single prior, beta ~ normal(0, 10), across all signal (“A”) values to ensure that we have one model across all possible signal values.

Here is the estimated MSE for a single run on a single data 400-element vector for different signal values (“A”):

mse_one_data_point

Some final comments:

With regard to diagnostics, you can generally find a balance between tau, tree_depth and adapt_delta that produces a good result, using the workflow referenced by Aki above.

With regard to the ability to manage higher dimensional problems: this case has 400 features of 2 dimensional Rebar parameters. Also, you can find another example of a single Rebar parameter with 70 dimensions here: How to model a choice between 2 distributions
You do have to adjust adapt_delta and tree_depth depending on your choice of tau.

With regard to whether Rebar is a exactly equivalent to a discrete distribution, the answer is no. It is a continuous distribution but is close to discrete distribution (or a “RElaxation” as was termed in the original “ConcREte” paper), which may be good enough in many circumstances.

For example, I uploaded an automatic feature selection model in a regression framework to https://github.com/howardnewyork/rebar (see feature_selection_01.R and feature_selection_01.stan) where Rebar nicely identifies the signal features. The examples shows a challenging task of identifying the important features in a high dimensionality problem with limited data. At some stage, it would be worth comparing this approach to a Bayesian implementation of Lasso, forward stepwise regression or other feature selection algorithms.

1 Like