All,
I am attempting to reproduce examples in Edward Greenberg’s book, “Bayesian Econometrics” with Stan. In Chapter 8 of the book, Greenberg estimates a normal linear regression model of a child’s birth weight on a host of covariates: gestation length, parity, mother’s age, mother’s height, mother’s pregnancy weight, and whether or not the mother smokes. The last parameter is what I would like to produce as in Greenberg.
I have attached the data, and R code to copy and paste into a list format is:
Clean and prep data, as in Greenberg:
babies_2 = data.frame(“babiesII.csv”) %>%
filter( gestation != “999” & smoke != “9” ) %>%
mutate( bwt_c = scale(bwt) ,
age_c = scale(age) ,
height_c = scale(height) ,
weight_c = scale(weight) ,
gestation_c = scale(gestation) )
Data to send to Stan in list format:
data_babies = with(babies_2 ,
list( N = nrow(babies_2) ,
bwt = as.vector(bwt_c) ,
gestation = as.vector(gestation_c) ,
parity = parity ,
age = as.vector(age_c) ,
height = as.vector(height_c) ,
weight = as.vector(weight_c) ,
smoke = smoke ) )
I have centered continuous variables to expedite estimation.
The code below effectively reproduces Greenberg’s results for a linear regression model with normally distributed errors:
data{
int<lower=1> N;
real bwt[N];
real gestation[N];
int<lower=0,upper=1> parity[N];
real age[N];
real height[N];
real weight[N];
int<lower=0,upper=1> smoke[N];
}
parameters{
real bgest;
real bpar;
real bage;
real bw;
real bh;
real bs;
real b0;
real<lower=0> sigma;
}
model{
vector[N] mu;
b0 ~ normal( 0 , 1.1 );
bs ~ normal( -0.5 , 0.55 );
bh ~ normal( 0 , 0.55 );
bw ~ normal( 0 , 0.55 );
bage ~ normal( 0 , 0.55 );
bpar ~ normal( 0 , 0.55 );
bgest ~ normal( 0 , 0.55 );
sigma ~ inv_gamma( 6 , 100 );
for ( i in 1:N ) {
mu[i] = b0 + bgest * gestation[i] + bpar * parity[i] + bage * age[i] + bw *
weight[i] + bh * height[i] + bs * smoke[i];
}
bwt ~ normal( mu , sigma );
}
The result for the “bs” parameter on mother’s smoking is roughly the same as Greenberg’s, although Greenberg gets a mean of -8.9 and I get roughly -8.4.
However, when I change the likelihood to a student-t distribution with 5 degrees of freedom, as Greenberg suggests for re-estimating the model, my result for “bs” looks similar to the normal model (I get -8.3) but Greenberg’s result is -4.656 as the mean for his “bs” parameter on the mother’s smoking variable. Please also note that Greenberg is using the “MCMCpack” package to estimate his models.
I re-estimated the Stan code identical to that above, except I changed the likelihood to a student-t distribution with nu = 5 as in the following code:
model{
vector[N] mu;
b0 ~ normal( 0 , 1.1 );
bs ~ normal( -0.5 , 0.55 );
bh ~ normal( 0 , 0.55 );
bw ~ normal( 0 , 0.55 );
bage ~ normal( 0 , 0.55 );
bpar ~ normal( 0 , 0.55 );
bgest ~ normal( 0 , 0.55 );
sigma ~ inv_gamma( 6 , 100 );
for ( i in 1:N ) {
mu[i] = b0 + bgest * gestation[i] + bpar * parity[i] + bage * age[i] + bw *
weight[i] + bh * height[i] + bs * smoke[i];
}
bwt ~ student_t( 5 , mu , sigma );
}
Am I coding this model correctly?
babiesII.csv (27.6 KB)
Best,
Eddie