Student-t vs. normal distributed errors in Bayes linear regression model

All,

I am attempting to reproduce examples in Edward Greenberg’s book, “Bayesian Econometrics” with Stan. In Chapter 8 of the book, Greenberg estimates a normal linear regression model of a child’s birth weight on a host of covariates: gestation length, parity, mother’s age, mother’s height, mother’s pregnancy weight, and whether or not the mother smokes. The last parameter is what I would like to produce as in Greenberg.

I have attached the data, and R code to copy and paste into a list format is:

Clean and prep data, as in Greenberg:

babies_2 = data.frame(“babiesII.csv”) %>%
filter( gestation != “999” & smoke != “9” ) %>%
mutate( bwt_c = scale(bwt) ,
age_c = scale(age) ,
height_c = scale(height) ,
weight_c = scale(weight) ,
gestation_c = scale(gestation) )

Data to send to Stan in list format:

data_babies = with(babies_2 ,
list( N = nrow(babies_2) ,
bwt = as.vector(bwt_c) ,
gestation = as.vector(gestation_c) ,
parity = parity ,
age = as.vector(age_c) ,
height = as.vector(height_c) ,
weight = as.vector(weight_c) ,
smoke = smoke ) )

I have centered continuous variables to expedite estimation.

The code below effectively reproduces Greenberg’s results for a linear regression model with normally distributed errors:


data{
    int<lower=1> N;
    real bwt[N];
    real gestation[N];
    int<lower=0,upper=1> parity[N];
    real age[N];
    real height[N];
    real weight[N];
    int<lower=0,upper=1> smoke[N];
}

parameters{
    real bgest;
    real bpar;
    real bage;
    real bw;
    real bh;
    real bs;
    real b0;
    real<lower=0> sigma;
}

model{
    vector[N] mu;

    b0 ~ normal( 0 , 1.1 );
    bs ~ normal( -0.5 , 0.55 );
    bh ~ normal( 0 , 0.55 );
    bw ~ normal( 0 , 0.55 );
    bage ~ normal( 0 , 0.55 );
    bpar ~ normal( 0 , 0.55 );
    bgest ~ normal( 0 , 0.55 );
    sigma ~ inv_gamma( 6 , 100 );

    for ( i in 1:N ) {
        
        mu[i] = b0 + bgest * gestation[i] + bpar * parity[i] + bage * age[i] + bw * 
        weight[i] + bh * height[i] + bs * smoke[i];
    }
    
    bwt ~ normal( mu , sigma );
}

The result for the “bs” parameter on mother’s smoking is roughly the same as Greenberg’s, although Greenberg gets a mean of -8.9 and I get roughly -8.4.

However, when I change the likelihood to a student-t distribution with 5 degrees of freedom, as Greenberg suggests for re-estimating the model, my result for “bs” looks similar to the normal model (I get -8.3) but Greenberg’s result is -4.656 as the mean for his “bs” parameter on the mother’s smoking variable. Please also note that Greenberg is using the “MCMCpack” package to estimate his models.

I re-estimated the Stan code identical to that above, except I changed the likelihood to a student-t distribution with nu = 5 as in the following code:


model{
    vector[N] mu;

    b0 ~ normal( 0 , 1.1 );
    bs ~ normal( -0.5 , 0.55 );
    bh ~ normal( 0 , 0.55 );
    bw ~ normal( 0 , 0.55 );
    bage ~ normal( 0 , 0.55 );
    bpar ~ normal( 0 , 0.55 );
    bgest ~ normal( 0 , 0.55 );
    sigma ~ inv_gamma( 6 , 100 );

    for ( i in 1:N ) {
        
        mu[i] = b0 + bgest * gestation[i] + bpar * parity[i] + bage * age[i] + bw * 
        weight[i] + bh * height[i] + bs * smoke[i];
    }
    
    bwt ~ student_t( 5 , mu , sigma );
}

Am I coding this model correctly?

babiesII.csv (27.6 KB)

Best,
Eddie

I think so. That inverse gamma prior can do strange things.

I tried re-estimating the above model using with the following:


sigma ~ cauchy(0 , 2)

The results were nearly identical to mine from using the inverse gamma that Greenberg employed. Not sure why my results are so different from Greenberg’s.