Beta regression using LASSO prior (Park, Casella) - Initial values rejected

Dear

I am currently trying to sample the posteriors given a beta likelihood.
f(y_i | a, b) = \frac{y_i^{(a-1)}(1-y_i)^{(b-1)}}{B(a,b)}
a = \mu\cdot\phi
b = (1-\mu)\cdot\phi
and \mu is linked by an inverse-logit function to be constraint in (0, 1).

I treat \phi as a scalar parameter, because it is already not working in this simpler case.

I also want to include some regularization priors on beta. Below you can see my try for the BLASSO by Park & Casella. I just went Bayes so I am not too experienced with all that full posterior densities etc.
As the likelihood is beta I don’t really have any sigma2 in my full posterior for beta and excluded it…

The issue:

The initial values are getting rejected once I have a large Matrix X.
If I run the code using just 40 Variables I usually get some posterior samples that do make sense.
Once the matrix gets larger (say 70 variables) the initial values are constantly rejected.

I have been trying now for a very long time. I believe the constraints in the data/parameters are all correct.
I tested 500 different seeds.
I printed the mu & phi samples and it should be proper. Yet it fails.
I hope you have an idea and can help me out…

test.csv (460.9 KB)

# Data
df_data <- read.csv( ... data attached in the post ...)
y <- df_data[, 1]
x <- df_data[, -1]
dat <- list(N = length(y), M = dim(x)[2], y = y, X = x)


write("// Stan model for beta LASSO Regression
      data {
        int < lower = 1 > N;
        int < lower = 1 > M;
        vector < lower = 0, upper = 1 > [N] y;
        matrix[N, M] X;

        // --- If phi is not constant
        // int<lower = 1> J; 
        // matrix[N, J] Z;
      }
      
      parameters {
        vector[M] beta;
        real alpha; //intercept

        // LASSO parameters
        real < lower = 0 > sigma; // Error SD
        vector < lower = 0 > [M] tau; // variance parameter for beta
        real < lower = 0 > lambda; // 

        // --- If phi is not constant
        //vector[J] gamma;
        real < lower = 0 > phi;

      }
      
      transformed parameters{
        vector < lower = 0, upper = 1 > [N] mu;
        //vector < lower = 0 > [N] phi;          
        vector < lower = 0 > [N] A;             // shape1 for beta distribution
        vector < lower = 0 > [N] B;             // shape2 for beta distribution

        // hyperprior for lambda
        real r = 1;
        real delta = 4;

        for (i in 1:N) {
          mu[i]  = inv_logit(alpha + X[i,] * beta);   
          //phi[i] = exp(Z[i,] * gamma);

        }
        //print(mu, phi);
        A = mu * phi;                   //change to .* if phi is not constant
        B = (1.0 - mu) * phi;           //change to .* if phi is not constant
      }
      
      model {
        // priors
        phi ~ exponential(3);               // if phi is global
        lambda ~ gamma(r , delta);          // lasso
        tau ~ exponential(lambda^2 / 2);    // lasso
        beta ~ normal(0, tau);              // lasso
        alpha ~ normal(0, 2);               // intercept
        //sigma ~ normal(0, 2);             // doesnt really exist in model equation...
        
        // likelihood
        y ~ beta(A, B);
      }
      
      generated quantities{
        vector<lower=0,upper=1>[N] y_rep;        
        
        for (n in 1:N) {
          y_rep[n] = beta_rng(A[n], B[n]);
        }
}
// The posterior predictive distribution",

"betaBLASSO.stan") # 


fit <- stan(file='betaBLASSO.stan',
                    data = dat, seed=3,
                    warmup = 500, iter = 1000, chains = 1,
                    control=list(adapt_delta=0.99, max_treedepth=12))

See, for example, here https://statmodeling.stat.columbia.edu/2017/11/02/king-must-die/ and here https://betanalpha.github.io/assets/case_studies/bayes_sparse_regression.html why Bayesian lasso is not that good idea even if lasso and Bayes separately can be good. I recommend regularized (Finnish) horseshoe instead https://projecteuclid.org/euclid.ejs/1513306866

For changing the initialization see init and init_r options.

1 Like

Thanks for your response.
I will apply (regularized) Horseshoe too.
Yet, my problem is not based on the “better” regularization method, but the initialization of the chain(s)…

Best

See the second part of my answer:

Did change init_r to several different values without success.

Can you try with init? You need check that A and B get reasonable values. You can set each parameter with init and then calculate corresponding A and B and check that they are reasonable.