Bayesian Compressed Regression

Would it be possible to implement Bayesian Compressed Regression in Stan?

The paper is here
and there is a presentation here

My sense is that it would be difficult since they are not actually trying to estimate the Phi matrix in Equation 3 of the paper. Rather, they are simulating random values of Phi and then normalizing it. My recollection is that I can’t use random variables this way in Stan.

It does seem like it’s something you could code. I read far enough to see that Phi is constructed from discrete elements (psi^1/2, 0, or -psi^1/2) which could be constructed via a mixture if you were willing to make psi a parameter. No idea how well that would work in practice.

They say in the paper that Psi is a parameter, but I don’t think Phi is. I
didn’t think you could do something like that where you randomly simulate a
variable in Stan and use that in a model. I figured you would need to
marginalize it out or something (which I wouldn’t know how to do).

It would be a transformed parameter and you would need to marginalize but it would look more or less like this:

paramameters {
  real psi;

transformed parameters {
  matrix[N,P] phi;
  for (i in 1:N) {
    for (p in 1:P) {
      phi[i,p] = psi^2 * -psi^(1/2)  + (1-psi^2) psi^(1/2) 
  phi = gram_scmidt(phi)  // that would need implementing, it's useful
                          // for other models too and I've seen an implementation
                          // that looked auto-diff friendly.

I’m just saying it’s a calculation you could set up for Stan. The posterior geometry could be atrocious.

My first thought was that the formula on page 60 of the presentation (also stated in the paragraph between Equations 3 and 4 of the paper) looks different, but this is the type of thing that I don’t always have a good intuitive grasp or and would need to play around with it to be sure.

And from what I know about Stan, it makes sense that the geometry could be atrocious given that formula.

Agreed, I wouldn’t dig into this unless I had the time to make it a project.

I wrote up an R script to show the difference between the two (at least I
think I modified yours to match it correctly, the presentation and the
paper have different formulas). Phi2 below seems (using your formula) much
more well-behaved. Phi is like a mixture between a discrete variable and
another mixture of two continuous distributions. I can imagine Stan would
not like it.

sample_phi_i <- function(psi, r)
if (r < (psi ^ 2))
return(-1 / sqrt(psi))
else if(((1 - psi) ^ 2) < r)
return(1 / sqrt(psi))

sample_phi2_i <- function(psi, r)
return(psi^2 * (-1/sqrt(psi)) + ((1 - psi) ^ 2) * (1 / sqrt(psi)))

sample_phi <- function(psi, f)
N <- length(psi)
phi <- psi
r <- runif(N)
for(i in 1:N)
phi[i] <- f(phi[i], r[i])

N <- 1000

psi <- 0.1 + runif(N) * 0.9

phi <- sample_phi(psi, sample_phi_i)

phi2 <- sample_phi(psi, sample_phi2_i)


What if you run sample_phi(psi, sample_phi_i) a thousand time and then average replicates for each entry prior to creating the histogram?

I put two versions below, they both produce about the same histograms,
normal-looking centered around 0.25. The phi and phi2 functions look a bit

newphi <- rep(0, N)
for (i in 1:N)
psi <- 0.1 + runif(N) * 0.9
newphi <- newphi + sample_phi(psi, sample_phi_i)
newphi <- newphi / N

Hand-wavy terms: that’s marginalization for you.

More seriously, the question is how do changes in psi affect the posterior distribution of the betas, but that would take more effort since you’d need the whole gram-schmidt thing coded

Before you do anything else with this I would suggest taking a look at the normal mixture section of the manual where the implementation of marginalization is discussed. Once that makes sense I think this will too.