First a big thank you to the Stan team for all your hard work and great software!
Then I have a question.
I have a simple stan model that behaves roughly as I expect (and I realise that this model may be on the borderline of what is acceptable):
data {
}
parameters {
real base_day_rate;
real<lower=0> lambda_day[7];
}
transformed parameters {
}
model {
int D = 7;
base_day_rate ~ cauchy(4e+8,4e+7);
for (di in 1:D) {
lambda_day[di] ~ exponential(1 / base_day_rate);
}
}
generated quantities {
int D = 7;
real y_day_gen[7];
for (di in 1:D) {
y_day_gen[di] = poisson_rng(lambda_day[di]);
}
}
With this model I get the following generated ‘y_day_gen’:
> head(y_gen)
$y_day_gen
iterations [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 109789809 247799228 656133486 229725158 27181412 484288843 251370113
[2,] 594676361 1024747036 559579897 237139906 155872752 120202778 230216097
[3,] NaN NaN NaN NaN NaN NaN NaN
[4,] 4602782 140032948 523535670 98916818 161583217 735387783 290618599
[5,] NaN NaN NaN NaN NaN NaN NaN
[6,] NaN NaN NaN NaN NaN NaN NaN
[7,] 185788055 725481891 700271516 710902257 42526002 53719741 676507851
[8,] 12686669 157736390 310066717 173271824 53064507 353644407 96101002
[9,] NaN NaN NaN NaN NaN NaN NaN
[10,] 191226507 454295779 192153680 342521187 353371838 490412318 220506782
[11,] NaN NaN NaN NaN NaN NaN NaN
[12,] 464579472 615308311 222681792 494639546 205812940 175115223 19027325
[13,] 239364116 44092585 136881745 828308213 80511843 188539284 360616658
[14,] 132108739 99328889 6267587 129124767 680008568 129562061 400632360
[15,] NaN NaN NaN NaN NaN NaN NaN
What confuses me is that just adding an extra unused parameter (real<lower=0> lambda_hour[24];) quite drastically changes the model behaviour:
data {
}
parameters {
real base_day_rate;
real<lower=0> lambda_day[7];
real<lower=0> lambda_hour[24];
}
transformed parameters {
}
model {
int D = 7;
base_day_rate ~ cauchy(4e+8,4e+7);
for (di in 1:D) {
lambda_day[di] ~ exponential(1 / base_day_rate);
}
}
generated quantities {
int D = 7;
real y_day_gen[7];
for (di in 1:D) {
y_day_gen[di] = poisson_rng(lambda_day[di]);
}
}
With this model I get:
$y_day_gen
iterations [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 10 8 2 0 5 2 30
[2,] 7 13 4 2 5 1 26
[3,] 5 19 3 17 2 0 13
[4,] 8 5 6 1 6 0 16
[5,] 11 10 6 21 3 1 10
[6,] 7 14 0 3 3 0 25
[7,] 5 12 4 1 5 0 12
[8,] 9 22 6 3 4 0 16
[9,] 7 16 4 2 1 1 13
[10,] 9 2 1 4 7 1 7
[11,] 6 9 2 32 3 3 7
[12,] 5 29 2 17 3 3 13
[13,] 5 10 7 3 5 3 18
[14,] 8 20 2 11 1 0 8
Here is the corresponding R code:
library(tidyverse)
library(rstan)
library(lubridate)
stanfile <- "rop_model_gen.stan"
pars <- c("y_day_gen", "base_day_rate", "lambda_day")
rop_m<-stan(file=stanfile,
pars = pars,
iter=16000,
chains=1,
control = list(max_treedepth = 15,adapt_delta = 0.99))
y_gen <- rstan::extract(rop_m,"y_day_gen")
y_day_gen <- as.data.frame(y_gen$y_day_gen)
colnames(y_day_gen) <- paste0("day", 1:7)
summary(y_day_gen)
y_day_gen <- gather(y_day_gen)
ggplot(y_day_gen,aes(x=value,color=key,fill=key)) + geom_histogram()
print(traceplot(rop_m,"base_day_rate"))
print(traceplot(rop_m,"lambda_day"))
head(y_gen)
My setup:
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 lubridate_1.7.4 rstan_2.18.2 StanHeaders_2.18.1 forcats_0.3.0
[6] stringr_1.3.1 dplyr_0.7.7 purrr_0.2.5 readr_1.1.1 tidyr_0.8.1
[11] tibble_1.4.2 ggplot2_3.0.0 tidyverse_1.2.1
loaded via a namespace (and not attached):
[1] tidyselect_0.2.5 haven_1.1.2 lattice_0.20-35 colorspace_1.3-2 stats4_3.5.1
[6] loo_2.0.0 yaml_2.2.0 base64enc_0.1-3 utf8_1.1.4 rlang_0.3.1
[11] pkgbuild_1.0.2 pillar_1.3.0 glue_1.3.0 withr_2.1.2 modelr_0.1.2
[16] readxl_1.1.0 matrixStats_0.54.0 bindr_0.1.1 plyr_1.8.4 munsell_0.5.0
[21] gtable_0.2.0 cellranger_1.1.0 rvest_0.3.2 codetools_0.2-15 labeling_0.3
[26] inline_0.3.15 callr_3.0.0 ps_1.2.0 parallel_3.5.1 fansi_0.4.0
[31] broom_0.5.0 Rcpp_0.12.19 KernSmooth_2.23-15 scales_1.0.0 backports_1.1.2
[36] jsonlite_1.5 gridExtra_2.3 digest_0.6.18 hms_0.4.2 stringi_1.2.4
[41] processx_3.2.0 grid_3.5.1 cli_1.0.1 tools_3.5.1 magrittr_1.5
[46] lazyeval_0.2.1 crayon_1.3.4 pkgconfig_2.0.2 xml2_1.2.0 prettyunits_1.0.2
[51] assertthat_0.2.0 httr_1.3.1 rstudioapi_0.8 R6_2.3.0 nlme_3.1-137
[56] compiler_3.5.1
Is this a (known?) bug or is this change in behaviour understandable somehow?
Now, I accept that the model might be bad, but I guess adding an unused parameter shouldn’t change the model behaviour?
Best Regards
-Leif