Using a Generalized Beta Distribution of the Second Kind as a Prior in Linear Regression

Emap · September 18, 2022, 12:13pm

So I’m considering a simple linear regression model with p = 1 predictors y = \beta x + \epsilon where \epsilon \sim N(0,\sigma^2). I want to use a generalised beta distribution of the second kind(GB2) as a prior for \beta and an inverse gamma distribution for \sigma^2, and then sample from the posterior distribution. Would this be doable using Stan? I noticed GB2 is not a built in.

Please keep in mind I am completely new to using Stan.

JohnnyZoom4H · September 18, 2022, 7:47pm

This certainly can be done.

A straightforward way to do this would be to define the log of the GB2 pdf in the functions block, then put a sampling statement in the model block for \beta, for example

\beta ~ GB2.lpdf(parm1, parm2, parm3, parm4);

To see how to construct the lpdf function, see the “User Defined Functions” section of the User Guide.

You could also use a “target” statement in the model block. But regardless do pay attention to whether you will be needing the normalizing additive constants from the sampling function. For newer Stan versions, there are different function extensions (like “lpdf”) for including it or not.

Just an aside: I have entertained using the GB2 distribution for scale parameters many times, but never main/mean parameters. I encourage you to have a good think-through about using this distribution for the mean parameters.

Emap · September 19, 2022, 7:14am

Thank you for the advice.

I’m having some trouble implementing the GB2.lpdf function. I’ve looked at the user defined function section on the Stan website, although the syntax is taking some getting used to. It seems like the beta function is a defined function in Stan according to this 30.1 Beta | Stan Functions Reference but typing B(p,q) in the code gives an error saying that the function B(p,q) is not recognised. This is the code I have thus far. Note that the first shape parameter “a” is always 2 in my case, which Is what I have used here.

"functions{
    real GB2.lpdf(real y, real b,real p,real q) {
    
      real ret;
      ret = 2*y^(2*p-1)/(1 + (y/b)^2)^(p+q);
      ret = ret/B(p,q);
      ret = log(ret);
      
      if (y < 0) {
      return 0;
      }
      else {
      return ret;
      }
      }

}

data {
  int<lower=1> n; // Number of data
  vector[n] x;  // nx1 vector of x_i
  vector[n] y;      // n-dimensional response vector
}

parameters {
  real beta; // regression coefficient
  real<lower=0> sigma; // dispersion parameter
}

transformed parameters {
  vector[n] theta ;
  theta = x * beta;
  real<lower=0> k;
  k = n/(x%*%x)
}

model {
  sigma ~ inv_gamma(0.001,0.001);
  beta ~ GB2.lpdf(k,1,0.5)
  
  y ~ normal(theta, sigma);
}"

simonbrauer · September 19, 2022, 5:42pm

You want beta() rather than B(). See 3.13 Combinatorial functions.

As a side note, you should also use _lpdf rather than .lpdf. See 14.1 Suffix marks type of function.

functions{
    real GB2_lpdf(real y, real b, real p, real q){...}
}

As stated by @JohnnyZoom4H , you can then use the function either as a sampling statement (using ~) or to increment target. I believe you would need to modify the function to take a vector for y to truly vectorize it and thereby take advantage of the sampling-style statement. But I could be mistaken.

model{
   for(i in 1:n){
        // Option 1: sampling statement
        y[i] ~ GB2(k, 1, 0.5);

         // Option 2: increment "target"
        target += GB2_lpdf(y[i] | k, 1, 0.5);
    }
}

And your model block as

Emap · September 19, 2022, 6:13pm

Thank you, that seemed to resolve the issue.

I’m a bit confused regarding the sampling. I want the prior to be for beta, so should I not type something like

beta~GB2_lpdf(k,1,0.5)

instead of having y on the LHS?

Also, is there a way to simulate a sign inside the Stan code? I want my prior to be the negative of a GB2 with probability 0.5, and the positive of a GB2 with probability 0.5. I tried using the binomial_rng() function and simulate the sign from that, but it seems to not work that well for this particular issue.

simonbrauer · September 19, 2022, 6:23pm

My mistake. I should have said that your options are

beta~GB2(k,1,0.5);

or

target += GB2_lpdf(beta | k,1,0.5);

Emap · September 20, 2022, 9:47am

Thank you, that makes things clear.

Do have any tips in regards to making the sign of the prior negative 50\% of the time and positive the other 50\% of the time? Perhaps it’s possible to simulate a Bernoulli random variable which decides the sign of the prior?

simonbrauer · September 20, 2022, 12:14pm

Do you mean that you’d like to reflect the distribution across 0? If so, the simplest solution would be to take the absolute value of y in the distribution function and divide the density in half.

Stan does not allow discrete parameters, so what you are suggesting here does not sound possible. Instead, you have to marginalize over the discrete parameters. You can find plenty of information on this in the Stan documentation or the Discourse.

Topic		Replies	Views
Generalized Beta of the second kind (GB2) Modeling specification	5	762	November 7, 2023
Posterior from STAN model as prior for another model Modeling	6	200	November 10, 2024
Help with implementation of Normal Normal-Inverse Gamma model Modeling specification , cmdstanr	1	486	February 16, 2023
User-defined generalised beta distribution as prior makes no difference from flat prior Modeling	4	1475	December 13, 2017
Defining a hierarchical GEV with a modified Beta prior Modeling stan , hierarchical-model	2	505	January 8, 2023

Using a Generalized Beta Distribution of the Second Kind as a Prior in Linear Regression

Related topics