# Linear model with Zellner g- prior

LinearModel.stan (817 Bytes) LinearModel_stan.R (1.3 KB)

Hi,
I’m trying to implement a linear model with Zellner prior:

Y | X,\beta, \sigma^2 \sim N(X \beta, \sigma^2 I_n)

\beta | \sigma^2, X \sim N_p(b_0, \sigma^2 B_0)
\sigma^2 \sim Inv-gamma(\frac{\nu0}{2}, \frac{\nu_0\sigma_0^2}{2})

with \nu_0=0 and B_0=c (X^T X)^{-1} (c constant)

To run my model I used a simulated dataset where
\beta=[1,2,3,4]
and
\sigma^2=5

The \beta estimated by the model seems to converge to the true value (only the intercept shows a slightly different value), but the variance doesn’t converge to the exact value.

This is my model

data {
int<lower=0> N; // number of data
vector[N] y;    // response
int<lower=0> p; //number of regressors
matrix[N, p] X; //matrix of covariates

// Zellner prior parameters
real c;
vector[p] b0;

// prior parameters of gamma
real sigma0;
real nu0;
}

parameters {
vector[p] beta;       //regressors paramters (column vector)
real<lower=0> sigma2;  // standard deviation
}

transformed parameters
{
vector[N] mu;         // mean
matrix[p, p] B0;

for(i in 1:N){
mu[i] =  X[i,] * beta;
}

B0= c * inverse(X' * X);

}

model {
//likelihood:
y ~ normal(mu, pow(sigma2, 0.5));

//Prior:
beta ~ multi_normal(b0,  B0);

sigma2 ~ inv_gamma(nu0/2, nu0*sigma0/2); // paramters of zellner prior

}
}


This is what I get

Inference for Stan model: LinearModel.
4 chains, each with iter=1e+05; warmup=50000; thin=10;
post-warmup draws per chain=5000, total post-warmup draws=20000.
mean se_mean   sd  2.5%   25%  50%  75% 97.5% n_eff Rhat
beta[1] 1.42    0.02 3.34 -5.07 -0.83 1.43 3.69  7.97 19742    1
beta[2] 2.02    0.00 0.28  1.48  1.83 2.02 2.20  2.56 19823    1
beta[3] 2.92    0.00 0.17  2.59  2.81 2.92 3.03  3.26 19299    1
beta[4] 3.95    0.00 0.15  3.66  3.85 3.95 4.05  4.24 19493    1
sigma2 **112.68**    0.12 16.3 85.28 101.11 111.35 122.68 148.58 19549    1

Samples were drawn using NUTS(diag_e) at Thu Dec 19 18:01:11 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).


I also tried to use reference priors and i get an estimated value of \sigma^2 equal to 6.34 which is more closed to what I expect.

Can someone help me to understand why using the zellner prior I get that result? Is there something wrong with my model?

I looked at this a little and I have a question about the calculation of B0. I ran your code just as you posted it and noticed that the n_eff of the B0 elements was very low. Here is the result I got when adding B0[1,1] to the printed output.

          mean se_mean    sd  2.5%    25%    50%    75%  97.5% n_eff Rhat
beta[1]   1.48    0.07  3.28 -4.82  -0.69   1.49   3.70   7.73  2048    1
beta[2]   2.02    0.01  0.27  1.48   1.84   2.02   2.21   2.57  2016    1
beta[3]   2.92    0.00  0.17  2.59   2.80   2.91   3.03   3.25  1896    1
beta[4]   3.95    0.00  0.15  3.66   3.85   3.95   4.05   4.23  2205    1
sigma2  113.03    0.38 16.76 85.29 100.85 111.63 123.18 149.99  1943    1
B0[1,1]  10.01    0.00  0.00 10.01  10.01  10.01  10.01  10.01     2    1


Reading a bit to remind myself about the Zellner g-prior I noticed that sigma2 was not used in calculating B0. I changed that one line of code from

B0= c *  inverse(X' * X);


to

B0= c * sigma2 * inverse(X' * X);


and got the following:

          mean se_mean    sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
beta[1]   1.41    0.03  1.31  -1.10   0.54   1.38   2.25   4.10  1853    1
beta[2]   2.04    0.00  0.11   1.82   1.96   2.04   2.12   2.26  1964    1
beta[3]   2.95    0.00  0.07   2.81   2.90   2.95   2.99   3.08  1796    1
beta[4]   3.98    0.00  0.06   3.87   3.95   3.99   4.02   4.10  1777    1
sigma2   17.64    0.04  1.95  14.44  16.29  17.48  18.78  22.01  2108    1
B0[1,1] 176.66    0.43 19.52 144.58 163.16 175.07 188.06 220.43  2108    1


The value of sigma2 is closer to what you expect and n_eff of B0[1,1] is now similar to other parameters. I hope that is helpful. I am very much learning as I go with Stan.

2 Likes

And if I make c a parameter, sigma2 drops down to 6. I have merely glanced at this output, so do not take this as a definite usable result.

//

data {
int<lower=0> N; // number of data
vector[N] y;    // response
int<lower=0> p; //number of regressors
matrix[N, p] X; //matrix of covariates

// Zellner prior parameters
// real c;
vector[p] b0;

// prior parameters of gamma
real sigma0;
real nu0;
}

parameters {
vector[p] beta;       //regressors paramters (column vector)
real<lower=0> sigma2;  // standard deviation
real<lower=0> c;
}

transformed parameters
{
vector[N] mu;         // mean
matrix[p, p] B0;

for(i in 1:N){
mu[i] =  X[i,] * beta;
}

B0= c * sigma2 * inverse(X' * X); //c * sigma2 * inverse(X' * X);

}

model {
//likelihood:
y ~ normal(mu, pow(sigma2, 0.5));

//Prior:
beta ~ multi_normal(b0,  sigma2*B0);

sigma2 ~ inv_gamma(nu0/2, nu0*sigma0/2); // paramters of zellner prior

}

data_lm2 <-list(N = N,
y=Y,
p = p,
X=as.matrix(X),
#c=c,
b0=b0,
sigma0=sigma0,
nu0=nu0
)
LM2 <- stan(file = "LinearModel.stan",
data = data_lm2,
chains = 4,
iter = 10000,
warmup = 5000,
thin= 10,
seed = 42,
init = inits,
algorithm = 'NUTS')

print(LM2, pars = c('beta','sigma2', "B0[1,1]", "c"))

mean  se_mean        sd    2.5%      25%      50%      75%     97.5% n_eff Rhat
beta[1]      1.44     0.02      0.78   -0.12     0.93     1.45     1.96      2.96  1976    1
beta[2]      2.04     0.00      0.06    1.92     2.00     2.04     2.08      2.17  1765    1
beta[3]      2.95     0.00      0.04    2.87     2.92     2.95     2.98      3.03  1969    1
beta[4]      3.99     0.00      0.03    3.92     3.96     3.99     4.01      4.06  1930    1
sigma2       6.03     0.02      0.86    4.62     5.43     5.94     6.56      7.93  2118    1
B0[1,1]  65877.33 10355.04 414202.28 2482.17  6382.21 13157.12 32176.70 382911.13  1600    1
c       111888.58 17181.25 713772.97 3800.57 10598.67 22067.51 54970.65 649619.13  1726    1

1 Like

Haven’t had a close look, but in any case you should precompute inverse(X' * X) or c * inverse(X' * X) (when c is data) in the generated data block. Preferably using a Cholesky decomposition for numerical stability.

1 Like

I see I made a mistake editing the original code. I added sigma2 to the calculation of B0 in the transformed parameters block but I did not delete it from the prior on beta in the model block - because I did not see it there. Running this model

data {
int<lower=0> N; // number of data
vector[N] y;    // response
int<lower=0> p; //number of regressors
matrix[N, p] X; //matrix of covariates

// Zellner prior parameters
//real c;
vector[p] b0;

// prior parameters of gamma
real sigma0;
real nu0;
}

parameters {
vector[p] beta;       //regressors paramters (column vector)
real<lower=0> sigma2;  // standard deviation
real<lower=0> c;
}

transformed parameters
{
vector[N] mu;         // mean
matrix[p, p] B0;

for(i in 1:N){
mu[i] =  X[i,] * beta;
}

B0= c * sigma2 * inverse(X' * X); //c * sigma2 * inverse(X' * X);

}

model {
//likelihood:
y ~ normal(mu, pow(sigma2, 0.5));

//Prior:
beta ~ multi_normal(b0,  B0);

sigma2 ~ inv_gamma(nu0/2, nu0*sigma0/2); // paramters of zellner prior

}


using this R code (the original R code from @Eugenia except that c is a parameter and B0[1,1] and c are added to the print statement)

library(rstan)
library(coda)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

# Data Set

## Set the seed for the Random Number Generator
set.seed(431234)

## Create some simulated data
N=100 # number of observations
K<-4  #length of our parameters 12 24
# Matrix X of covariates
X<-matrix(nrow=N, ncol=K)
X[,1]<-rep(1,N)
for(i in 2:K){
X[,i]<-rnorm(n = N, mean = i^2 ,sd = i*2)
}

# Constant values

b0<-rep(0,K)
# Initial values
beta<-seq(1,K,1) #[1 2 3 4]
sigmasquare<-5
tau<-sigmasquare^(-1)

# Data
Y=rep(0,N)
for (i in 1:N){
Y[i]=rnorm(n=1, mean=X[i,]%*%beta,sd=sigmasquare^0.5)
}

################ MODEL

# Data
N <- length(Y)
p <- dim(X)[2]

c<-100
b0<-rep(0,K)
sigma0 <-1 #parametro della inv gamma
nu0<-0.0001

data_lm <-list(N = N,
y=Y,
p = p,
X=as.matrix(X),
#c=c,
b0=b0,
sigma0=sigma0,
nu0=nu0
)

#initialization of the parameters

inits <- function()
{
list(
beta = rep(0.1, p),
sigma2=1)
}

# run stan model

LM <- stan(file = "LinearModel.stan",
data = data_lm,
chains = 4,
iter = 10000,
warmup = 5000,
thin= 10,
seed = 42,
init = inits,
algorithm = 'NUTS')

print(LM, pars = c('beta','sigma2', "B0[1,1]", "c"))


I get this result

               mean   se_mean         sd     2.5%      25%       50%       75%      97.5% n_eff Rhat
beta[1]      1.42      0.02       0.79    -0.15     0.88      1.40      1.95       3.01  2141    1
beta[2]      2.04      0.00       0.07     1.91     2.00      2.04      2.09       2.17  2020    1
beta[3]      2.95      0.00       0.04     2.87     2.92      2.95      2.97       3.03  2033    1
beta[4]      3.99      0.00       0.04     3.92     3.97      3.99      4.01       4.06  1910    1
sigma2       6.16      0.02       0.88     4.66     5.55      6.09      6.71       8.05  1898    1
B0[1,1] 484971.38  86134.79 3863633.77 14065.39 39747.22  76231.69 189598.25 2618988.70  2012    1
c       823373.73 169803.29 7611772.91 21654.25 66066.00 125247.29 311249.31 4492127.85  2009    1


Ok, I are what you suggest. The parameter c tends to be really high so that the posterior mean of \beta is minimally influenced by the prior mean b0, but is meanly driven by the empirical mean.

After seeing your result, I also tried to change b0 and instead of using a vector of all zeros I used values more similar to the true values. In this way, with c still a data equal to 100, sigma2 becomes closer to what I expected.
So the problem seems to be related to how much the prior mean influences the posterior estimation, which can be indeed controlled by the parameter c.

Thanks a lot for your help!

I’m not sure I understood what you suggested.
Should I compute inverse(X’ * X) in R and pass it as a data to the Stan file? I tried in this way but nothing changed, probably I’m not really getting your suggestion.

You can compute it in R or in the generated data block in Stan. It won’t change anything, but it’s better to compute it only once. For large X matrices this will be a lot faster.

Now I’ve really got it! Thanks a lot

1 Like