Bayesian Compressed Regression

jmh530 · August 23, 2017, 7:37pm

Would it be possible to implement Bayesian Compressed Regression in Stan?

The paper is here
https://arxiv.org/abs/1303.0642
and there is a presentation here
https://homes.stat.unipd.it/sites/homes.stat.unipd.it.lauraventura/files/dunson.pdf

My sense is that it would be difficult since they are not actually trying to estimate the Phi matrix in Equation 3 of the paper. Rather, they are simulating random values of Phi and then normalizing it. My recollection is that I can’t use random variables this way in Stan.

sakrejda · August 24, 2017, 12:54am

It does seem like it’s something you could code. I read far enough to see that Phi is constructed from discrete elements (psi^1/2, 0, or -psi^1/2) which could be constructed via a mixture if you were willing to make psi a parameter. No idea how well that would work in practice.

jmh530 · August 24, 2017, 1:57am

They say in the paper that Psi is a parameter, but I don’t think Phi is. I
didn’t think you could do something like that where you randomly simulate a
variable in Stan and use that in a model. I figured you would need to
marginalize it out or something (which I wouldn’t know how to do).

sakrejda · August 24, 2017, 2:08am

It would be a transformed parameter and you would need to marginalize but it would look more or less like this:

paramameters {
  real psi;
}

transformed parameters {
  matrix[N,P] phi;
  for (i in 1:N) {
    for (p in 1:P) {
      phi[i,p] = psi^2 * -psi^(1/2)  + (1-psi^2) psi^(1/2) 
    }
  }
  phi = gram_scmidt(phi)  // that would need implementing, it's useful
                          // for other models too and I've seen an implementation
                          // that looked auto-diff friendly.
 
}

sakrejda · August 24, 2017, 2:09am

I’m just saying it’s a calculation you could set up for Stan. The posterior geometry could be atrocious.

jmh530 · August 24, 2017, 2:43am

My first thought was that the formula on page 60 of the presentation (also stated in the paragraph between Equations 3 and 4 of the paper) looks different, but this is the type of thing that I don’t always have a good intuitive grasp or and would need to play around with it to be sure.

And from what I know about Stan, it makes sense that the geometry could be atrocious given that formula.

sakrejda · August 24, 2017, 1:11pm

Agreed, I wouldn’t dig into this unless I had the time to make it a project.

jmh530 · August 24, 2017, 1:56pm

I wrote up an R script to show the difference between the two (at least I
think I modified yours to match it correctly, the presentation and the
paper have different formulas). Phi2 below seems (using your formula) much
more well-behaved. Phi is like a mixture between a discrete variable and
another mixture of two continuous distributions. I can imagine Stan would
not like it.

sample_phi_i <- function(psi, r)
{
if (r < (psi ^ 2))
return(-1 / sqrt(psi))
else if(((1 - psi) ^ 2) < r)
return(1 / sqrt(psi))
else
return(0)
}

sample_phi2_i <- function(psi, r)
{
return(psi^2 * (-1/sqrt(psi)) + ((1 - psi) ^ 2) * (1 / sqrt(psi)))
}

sample_phi <- function(psi, f)
{
N <- length(psi)
phi <- psi
r <- runif(N)
for(i in 1:N)
{
phi[i] <- f(phi[i], r[i])
}
return(phi)
}

N <- 1000

psi <- 0.1 + runif(N) * 0.9

phi <- sample_phi(psi, sample_phi_i)

phi2 <- sample_phi(psi, sample_phi2_i)

hist(phi)
hist(phi2)

sakrejda · August 24, 2017, 2:15pm

What if you run sample_phi(psi, sample_phi_i) a thousand time and then average replicates for each entry prior to creating the histogram?

jmh530 · August 24, 2017, 2:42pm

I put two versions below, they both produce about the same histograms,
normal-looking centered around 0.25. The phi and phi2 functions look a bit
different.

newphi <- rep(0, N)
for (i in 1:N)
{
psi <- 0.1 + runif(N) * 0.9
newphi <- newphi + sample_phi(psi, sample_phi_i)
}
newphi <- newphi / N

sakrejda · August 24, 2017, 2:50pm

Hand-wavy terms: that’s marginalization for you.

More seriously, the question is how do changes in psi affect the posterior distribution of the betas, but that would take more effort since you’d need the whole gram-schmidt thing coded

sakrejda · August 24, 2017, 2:51pm

Before you do anything else with this I would suggest taking a look at the normal mixture section of the manual where the implementation of marginalization is discussed. Once that makes sense I think this will too.

Topic		Replies	Views
Bayesian inference in Stan Modeling	40	5722	July 18, 2017
Mixed models with deterministic and random variables Algorithms optimization , variational-bayes	7	2223	August 1, 2017
Bayesian Neural Network with Categorical Likelihood: efficient implementation Modeling	5	682	March 16, 2021
Zero-inflated prior or mixture distreibution for prior in Stan General	5	669	April 22, 2022
Parameter recovery issues with normal mixture model Modeling hierarchical-model , mixture	3	625	June 16, 2022

Bayesian Compressed Regression

Related Topics