Recreating results in Stan (FIXED)


#1

Dear all,

I have been trying to re-create the EXACT results from earlier runs in Stan. Due to MC error the results can obviously be slightly different between runs, but I always thought that if you would use the same initial values (and obviously the same model, data, etc.) the results would be similar between re-runs. However, I haven’t been able to do so. I must be missing something… right?

Below I got a simple schools 8 example in Rstan, where I first run the model 10 times with “random” initial values. Subsequently I run it again 10 times, but with fixed initial values. Note that the results keep changing. Also note that I deliberately decreased the sample size significantly for this example, as to exacerbate the MC error (which I would obviously not advice in general).

thanks you so much!
Alex

EDIT: I forgot to set the seed. This code does give the same results over and over again;

library(“rstan”)
#////////////////////////////////////////////////////////////

(1) Stan Model

#////////////////////////////////////////////////////////////
model_string <- “
data {
int<lower=0> J;
real y[J];
real<lower=0> sigma[J];
}
parameters {
real mu;
real<lower=0> tau;
real eta[J];
}
transformed parameters {
real theta[J];
for (j in 1:J)
theta[j] = mu + tau * eta[j];
}
model {
target += normal_lpdf(eta | 0, 1);
target += normal_lpdf(y | theta, sigma);
}”

#////////////////////////////////////////////////////////////

(2) Data

#////////////////////////////////////////////////////////////
schools_dat <- list(J = 8,
y = c(28, 8, -3, 7, -1, 1, 18, 12),
sigma = c(15, 10, 16, 11, 9, 11, 10, 18))

#////////////////////////////////////////////////////////////

(3) Running model with “random” inits

#////////////////////////////////////////////////////////////
mResults <- matrix(NA,18,10)
for(i in 1:10){
fit <- stan(model_code= model_string,
init = “random”,
warmup = 150,
iter = 200,
data = schools_dat,
chains = 1,
refresh = 0)
results <- as.data.frame(summary(fit)$summary)
results <- results[1:(nrow(results)-1),]
mResults[,i] <- results[,1]
}
matplot(mResults,type=‘l’)

#////////////////////////////////////////////////////////////

(4) Running model with “fixed” inits

#////////////////////////////////////////////////////////////
initialValue <- get_inits(fit)
SEED <- 12345
set.seed(SEED)

mResults <- matrix(NA,18,10)
for(i in 1:10){
fit <- stan(model_code= model_string,
init = initialValue,
warmup = 150,
iter = 200,
data = schools_dat,
chains = 1,
refresh = 0,
seed = SEED)
results <- as.data.frame(summary(fit)$summary)
results <- results[1:(nrow(results)-1),]
mResults[,i] <- results[,1]
print(get_inits(fit))
}
matplot(mResults,type=‘l’)
#////////////////////////////////////////////////////////////


#2

See if setting the seed works:

SEED <- 12345
set.seed(SEED)

and then use that in your Stan call: stan(... seed = SEED).


#3

Yes!

That worked. I am not 100% why this is needed, but it works.
Thank you.


#4

The algorithm is stochastic.


#5

There’s a chapter in the Stan manual on reproducibility that may help. The pseudorandom number generators used to sample from the posterior vary unless you fix their seed, which determines the sequence of draws they produce.