We are trying to achieve reproducible results between two programmers in our analysis. The only difference, as far as we can tell, between the two analyses is the reference level of the factor covariate. The two sets of code use the same model, dataset (except for the factor reference level), and the same random seed. While the results are similar, I expect them to be exactly the same.
The data cannot be shared, but here is a reproducible example using mtcars
:
library(tidyverse)
library(rstanarm)
#> Loading required package: Rcpp
#> This is rstanarm version 2.32.1
#> - See https://mc-stan.org/rstanarm/articles/priors for changes to default priors!
#> - Default priors may change, so it's safest to specify priors, even if equivalent to the defaults.
#> - For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores())
options(mc.cores = parallel::detectCores())
df_3_ref <- mtcars %>%
mutate(gear = factor(gear) %>% fct_relevel("3", "4", "5"))
df_5_ref <- mtcars %>%
mutate(gear = factor(gear) %>% fct_relevel("5", "4", "3"))
f_3_ref <- stan_glm(
mpg ~ -1 + gear,
family = gaussian(),
prior = normal(0, 5, autoscale = FALSE),
seed = 123,
data = df_3_ref
)
f_5_ref <- stan_glm(
mpg ~ -1 + gear,
family = gaussian(),
prior = normal(0, 5, autoscale = FALSE),
seed = 123,
data = df_5_ref
)
summary(f_3_ref)
#>
#> Model Info:
#> function: stan_glm
#> family: gaussian [identity]
#> formula: mpg ~ -1 + gear
#> algorithm: sampling
#> sample: 4000 (posterior sample size)
#> priors: see help('prior_summary')
#> observations: 32
#> predictors: 3
#>
#> Estimates:
#> mean sd 10% 50% 90%
#> gear3 14.9 1.4 13.1 14.9 16.6
#> gear4 22.3 1.6 20.2 22.4 24.3
#> gear5 17.3 2.4 14.2 17.5 20.3
#> sigma 5.4 0.9 4.4 5.3 6.6
#>
#> Fit Diagnostics:
#> mean sd 10% 50% 90%
#> mean_PPD 18.0 1.5 16.2 18.1 19.8
#>
#> The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
#>
#> MCMC diagnostics
#> mcse Rhat n_eff
#> gear3 0.0 1.0 2774
#> gear4 0.0 1.0 2241
#> gear5 0.1 1.0 2291
#> sigma 0.0 1.0 1813
#> mean_PPD 0.0 1.0 2656
#> log-posterior 0.0 1.0 1330
#>
#> For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
summary(f_5_ref)
#>
#> Model Info:
#> function: stan_glm
#> family: gaussian [identity]
#> formula: mpg ~ -1 + gear
#> algorithm: sampling
#> sample: 4000 (posterior sample size)
#> priors: see help('prior_summary')
#> observations: 32
#> predictors: 3
#>
#> Estimates:
#> mean sd 10% 50% 90%
#> gear5 17.3 2.5 14.1 17.4 20.4
#> gear4 22.3 1.6 20.2 22.4 24.3
#> gear3 14.9 1.4 13.1 15.0 16.7
#> sigma 5.4 0.9 4.4 5.3 6.5
#>
#> Fit Diagnostics:
#> mean sd 10% 50% 90%
#> mean_PPD 18.1 1.4 16.2 18.2 19.8
#>
#> The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
#>
#> MCMC diagnostics
#> mcse Rhat n_eff
#> gear5 0.0 1.0 3086
#> gear4 0.0 1.0 2467
#> gear3 0.0 1.0 3683
#> sigma 0.0 1.0 1965
#> mean_PPD 0.0 1.0 2893
#> log-posterior 0.0 1.0 1566
#>
#> For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
Created on 2024-04-25 with reprex v2.1.0```
As you can see, the posterior results are similar, but not exactly the same.
Why?
- Operating System: macOS Ventura 13.2
- rstanarm Version: 2.32.1