# Student-t vs. normal distributed errors in Bayes linear regression model

All,

I am attempting to reproduce examples in Edward Greenberg’s book, “Bayesian Econometrics” with Stan. In Chapter 8 of the book, Greenberg estimates a normal linear regression model of a child’s birth weight on a host of covariates: gestation length, parity, mother’s age, mother’s height, mother’s pregnancy weight, and whether or not the mother smokes. The last parameter is what I would like to produce as in Greenberg.

I have attached the data, and R code to copy and paste into a list format is:

Clean and prep data, as in Greenberg:

babies_2 = data.frame(“babiesII.csv”) %>%
filter( gestation != “999” & smoke != “9” ) %>%
mutate( bwt_c = scale(bwt) ,
age_c = scale(age) ,
height_c = scale(height) ,
weight_c = scale(weight) ,
gestation_c = scale(gestation) )

Data to send to Stan in list format:

data_babies = with(babies_2 ,
list( N = nrow(babies_2) ,
bwt = as.vector(bwt_c) ,
gestation = as.vector(gestation_c) ,
parity = parity ,
age = as.vector(age_c) ,
height = as.vector(height_c) ,
weight = as.vector(weight_c) ,
smoke = smoke ) )

I have centered continuous variables to expedite estimation.

The code below effectively reproduces Greenberg’s results for a linear regression model with normally distributed errors:

``````
data{
int<lower=1> N;
real bwt[N];
real gestation[N];
int<lower=0,upper=1> parity[N];
real age[N];
real height[N];
real weight[N];
int<lower=0,upper=1> smoke[N];
}

parameters{
real bgest;
real bpar;
real bage;
real bw;
real bh;
real bs;
real b0;
real<lower=0> sigma;
}

model{
vector[N] mu;

b0 ~ normal( 0 , 1.1 );
bs ~ normal( -0.5 , 0.55 );
bh ~ normal( 0 , 0.55 );
bw ~ normal( 0 , 0.55 );
bage ~ normal( 0 , 0.55 );
bpar ~ normal( 0 , 0.55 );
bgest ~ normal( 0 , 0.55 );
sigma ~ inv_gamma( 6 , 100 );

for ( i in 1:N ) {

mu[i] = b0 + bgest * gestation[i] + bpar * parity[i] + bage * age[i] + bw *
weight[i] + bh * height[i] + bs * smoke[i];
}

bwt ~ normal( mu , sigma );
}

``````

The result for the “bs” parameter on mother’s smoking is roughly the same as Greenberg’s, although Greenberg gets a mean of -8.9 and I get roughly -8.4.

However, when I change the likelihood to a student-t distribution with 5 degrees of freedom, as Greenberg suggests for re-estimating the model, my result for “bs” looks similar to the normal model (I get -8.3) but Greenberg’s result is -4.656 as the mean for his “bs” parameter on the mother’s smoking variable. Please also note that Greenberg is using the “MCMCpack” package to estimate his models.

I re-estimated the Stan code identical to that above, except I changed the likelihood to a student-t distribution with nu = 5 as in the following code:

``````
model{
vector[N] mu;

b0 ~ normal( 0 , 1.1 );
bs ~ normal( -0.5 , 0.55 );
bh ~ normal( 0 , 0.55 );
bw ~ normal( 0 , 0.55 );
bage ~ normal( 0 , 0.55 );
bpar ~ normal( 0 , 0.55 );
bgest ~ normal( 0 , 0.55 );
sigma ~ inv_gamma( 6 , 100 );

for ( i in 1:N ) {

mu[i] = b0 + bgest * gestation[i] + bpar * parity[i] + bage * age[i] + bw *
weight[i] + bh * height[i] + bs * smoke[i];
}

bwt ~ student_t( 5 , mu , sigma );
}

``````

Am I coding this model correctly?

babiesII.csv (27.6 KB)

Best,
Eddie

I think so. That inverse gamma prior can do strange things.

I tried re-estimating the above model using with the following:

``````
sigma ~ cauchy(0 , 2)

``````

The results were nearly identical to mine from using the inverse gamma that Greenberg employed. Not sure why my results are so different from Greenberg’s.