On the advice of some people from this forum, I watched Richard McElreath’s ‘Statistical Rethinking’ lectures, and have since been going through version 2 of his Rethinking book (https://xcelab.net/rm/sr2/). It’s a great book and I’ve been learning a ton so far, but have hit the first part where for me, the code just doesn’t seem to be working like it does in the book’s examples.
McElreath uses a function called ulam in his “Rethinking” package, which is an interface for doing HMC in stan. So far all the code in the book has worked perfectly, but now on the first example of using ulam / stan (rather than quadratic approximation) I receive multiple warnings, the sampling takes ages, and the output indicates the sampling has not worked properly. This is clearly supposed to be quite a simple model as it is the first one introduced for Stan. Does anyone know of effective solutions to the issues below? I am also wondering if perhaps there are some settings in stan that I should change, which might have been missed by McElreath, in order to make something like this model work? Any help would be much appreciated.
The code is as follows (from http://xcelab.net/rmpubs/sr2/code.txt):
## R code 9.9 - loads the relevant data from the rethinking package and standardises variables of interest
library(rethinking)
data(rugged)
d <- rugged
d$log_gdp <- log(d$rgdppc_2000)
dd <- d[ complete.cases(d$rgdppc_2000) , ]
dd$log_gdp_std <- dd$log_gdp / mean(dd$log_gdp)
dd$rugged_std <- dd$rugged / max(dd$rugged)
dd$cid <- ifelse( dd$cont_africa==1 , 1 , 2
# code 9.10 runs the same model with quadratic approximation
## R code 9.11 - creates a more slimmed down list of variables of interest
dat_slim <- list(
log_gpd_std = dd$log_gdp_std,
rugged_std = dd$rugged_std,
cid = as.integer( dd$cid )
)
str(dat_slim)
## R code 9.12
m9.1 <- ulam(
alist(
log_gdp_std ~ dnorm( mu , sigma ) ,
mu <- a[cid] + b[cid]*( rugged_std - 0.215 ) ,
a[cid] ~ dnorm( 1 , 0.1 ) ,
b[cid] ~ dnorm( 0 , 0.3 ) ,
sigma ~ dexp( 1 )
) ,
data=dat_slim , chains=1 )
R code 9.13
precis( m9.1 , depth=2 ) # summarises the model
When I run the ulam function, I receive the following warnings:
Warning messages:
1: There were 184 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 15. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
2: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low
3: Examine the pairs() plot to diagnose sampling problems
I’ve tried both specifying a larger number in tree depth and also changing adapt_delta when an error for that came up. Neither (nor both in combination) resulted in the model executing effectively. I end up with effective samples of as low as 5 or so (example code below):
m9.1a <- ulam(
alist(
log_gdp_std ~ dnorm( mu , sigma ) ,
mu <- a[cid] + b[cid]*( rugged_std - 0.215 ) ,
a[cid] ~ dnorm( 1 , 0.1 ) ,
b[cid] ~ dnorm( 0 , 0.3 ) ,
sigma ~ dexp( 1 )
) ,
data=dat_slim , chains=1, control = list(max_treedepth = 15, adapt_delta = 0.95) )
I’ve also provided a pairs plot if this helps
Any help would be greatly appreciated!
MacOS sierra 10.12.6
Rstudio Version 1.2.1335
R 3.5.3