Fewer parameter in Map2Stan to than Stan

Hi, I’ve been learning Stan through the map2stan package and text. I’m encountering a problem in “transferring” my code from map2stan to Stan. First, the R code to create the map2stan object:


library(rethinking) #Web page, YouTube, Book
library(rstan)
library(shinystan)
library(rsconnect)
urlfile<-'https://raw.githubusercontent.com/blakeobeans/Bayesian-Baseball/master/giants.csv'
data<-read.csv(urlfile)
#Transform Data
data$score <- data$R-data$RA #Positive score indicates Giants win. Beats logistic. No ties.
data$opp_team <- coerce_index(data$Opp) #ID for team (function from rethinking)
data$pit_id <- coerce_index(data$pit) #Home pitcher
names(data) <- c("tm", "opp", "R", "RA", "pit","pitera", "score",  "opp_team", "pit_id")
data <- as.data.frame(data)
#Summary Stats
length(unique(data$opp_team)) # number of opposing teams
length(unique(data$pit)) # number of pitchers, interesting margins, moving guys
table(data$pit)
data$pitera_norm <-  (data$pitera - mean(data$pitera))/sd(data$pitera) #normalize ERA
standata <- data[,c("score", "opp_team", "pitera_norm")]
model2 <- map2stan(
alist(
score ~ dnorm( mu , sigma ) ,
mu <- a[opp_team] + b * pitera_norm,
sigma ~ dcauchy(0, 2.5),
a[opp_team] ~ dnorm( ai , as ),
ai ~ dnorm(0, 1),
as ~ dcauchy(0,2),
b ~ dnorm( 0, 1 )
),
data=standata, iter=6000, warmup=3000, chains=4)

This is where things get a little weird. As a workaround, I’m using the map2stan function stancode() to move the map2stan model to a “linkagecode” character vector, then running the stan_model() function on the linkagecode object.

linkagecode <- stancode(model2)
str(linkagecode)
model_obj <- stan_model(model_code = linkagecode, model_name = "BayesianBaseball")

At this point, I was encountering some errors when attemping to run the fit() function on the object. Stan needed a reference to the N variable in the data. So I rebuilt the dataset to include this variable as well as N_opp_team.

score <- standata$score
opp_team <- standata$opp_team
pitera_norm <- standata$pitera_norm
dat <- list()
dat$score <- score
dat$opp_team <- opp_team
dat$pitera_norm <- pitera_norm
dat$N <- 162
dat$N_opp_team <- 20
fit <- sampling(model_obj, data = dat, iter=6000, warmup=3000, chains = 4)

At this point, the model runs and the parameters can be estimated. However, along with the a[i] intercepts, I’m getting mu[i]- one for each observation. I was not getting mu[i] when I ran map2stan.

So I’m wondering why I’m getting mu[i] when I’m not getting that in map2stan. Also, my estimates for a[i] are slightly different with the Stan object then the map2stan object.

Thanks!

Hello!

I do not think there is a problem. In fact, I am pretty sure map2stan function makes a lot more than just producing a stan code : it preprocesses data to fulfill the stan internal requirement (such as having the number of groups or the number of data lines), and provides optimal sampling specifications.

EDIT : you can understand the need for N and N_opp_team by looking at how the loops work in the produced stan code.

For example, mu[i]s are intermediate predictors which does not need to be stored, because it is assumed you are first interested in estimated parameters (a, b, ai, as), and that fitted values and predictions will be computed a posteriori. You can avoid stan storing the mu vector by specifying
sampling(..., pars = "mu", include = F)

Concerning the differences in parameters recovery, they might be due to the random aspect of sampling. You can try setting the same seed in the map2stan and sampling functions.

Have a nice day!
Lucas