Adding unused variable changes model behaviour (Answered: no bug)

First a big thank you to the Stan team for all your hard work and great software!

Then I have a question.

I have a simple stan model that behaves roughly as I expect (and I realise that this model may be on the borderline of what is acceptable):

data {

}

parameters {
  real base_day_rate;
  real<lower=0> lambda_day[7];
}

transformed parameters {
}

model {
  int D = 7;
  base_day_rate ~ cauchy(4e+8,4e+7);
  for (di in 1:D) {
      lambda_day[di] ~ exponential(1 / base_day_rate);
  }
}
 
generated quantities {
  int D = 7;
  real y_day_gen[7];
  for (di in 1:D) {
      y_day_gen[di] = poisson_rng(lambda_day[di]);
  }
  
}

With this model I get the following generated ‘y_day_gen’:

> head(y_gen)
$y_day_gen
          
iterations       [,1]       [,2]       [,3]       [,4]       [,5]       [,6]       [,7]
      [1,]  109789809  247799228  656133486  229725158   27181412  484288843  251370113
      [2,]  594676361 1024747036  559579897  237139906  155872752  120202778  230216097
      [3,]        NaN        NaN        NaN        NaN        NaN        NaN        NaN
      [4,]    4602782  140032948  523535670   98916818  161583217  735387783  290618599
      [5,]        NaN        NaN        NaN        NaN        NaN        NaN        NaN
      [6,]        NaN        NaN        NaN        NaN        NaN        NaN        NaN
      [7,]  185788055  725481891  700271516  710902257   42526002   53719741  676507851
      [8,]   12686669  157736390  310066717  173271824   53064507  353644407   96101002
      [9,]        NaN        NaN        NaN        NaN        NaN        NaN        NaN
     [10,]  191226507  454295779  192153680  342521187  353371838  490412318  220506782
     [11,]        NaN        NaN        NaN        NaN        NaN        NaN        NaN
     [12,]  464579472  615308311  222681792  494639546  205812940  175115223   19027325
     [13,]  239364116   44092585  136881745  828308213   80511843  188539284  360616658
     [14,]  132108739   99328889    6267587  129124767  680008568  129562061  400632360
     [15,]        NaN        NaN        NaN        NaN        NaN        NaN        NaN

What confuses me is that just adding an extra unused parameter (real<lower=0> lambda_hour[24];) quite drastically changes the model behaviour:

data {

}

parameters {
  real base_day_rate;
  real<lower=0> lambda_day[7];
  real<lower=0> lambda_hour[24];
}

transformed parameters {
}

model {
  int D = 7;
  base_day_rate ~ cauchy(4e+8,4e+7);
  for (di in 1:D) {
      lambda_day[di] ~ exponential(1 / base_day_rate);
  }
}
 
generated quantities {
  int D = 7;
  real y_day_gen[7];
  for (di in 1:D) {
      y_day_gen[di] = poisson_rng(lambda_day[di]);
  }
  
}

With this model I get:

$y_day_gen
          
iterations [,1] [,2] [,3] [,4] [,5] [,6] [,7]
      [1,]   10    8    2    0    5    2   30
      [2,]    7   13    4    2    5    1   26
      [3,]    5   19    3   17    2    0   13
      [4,]    8    5    6    1    6    0   16
      [5,]   11   10    6   21    3    1   10
      [6,]    7   14    0    3    3    0   25
      [7,]    5   12    4    1    5    0   12
      [8,]    9   22    6    3    4    0   16
      [9,]    7   16    4    2    1    1   13
     [10,]    9    2    1    4    7    1    7
     [11,]    6    9    2   32    3    3    7
     [12,]    5   29    2   17    3    3   13
     [13,]    5   10    7    3    5    3   18
     [14,]    8   20    2   11    1    0    8

Here is the corresponding R code:

library(tidyverse)
library(rstan)
library(lubridate)

stanfile <- "rop_model_gen.stan"
pars <- c("y_day_gen", "base_day_rate", "lambda_day")

rop_m<-stan(file=stanfile,
             pars = pars, 
             iter=16000,
             chains=1,
             control = list(max_treedepth = 15,adapt_delta = 0.99))

y_gen <- rstan::extract(rop_m,"y_day_gen")
y_day_gen <- as.data.frame(y_gen$y_day_gen)
colnames(y_day_gen) <- paste0("day", 1:7)
summary(y_day_gen)
y_day_gen <- gather(y_day_gen)
ggplot(y_day_gen,aes(x=value,color=key,fill=key)) + geom_histogram()
print(traceplot(rop_m,"base_day_rate"))
print(traceplot(rop_m,"lambda_day"))
head(y_gen)

My setup:
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2.2     lubridate_1.7.4    rstan_2.18.2       StanHeaders_2.18.1 forcats_0.3.0     
 [6] stringr_1.3.1      dplyr_0.7.7        purrr_0.2.5        readr_1.1.1        tidyr_0.8.1       
[11] tibble_1.4.2       ggplot2_3.0.0      tidyverse_1.2.1   

loaded via a namespace (and not attached):
 [1] tidyselect_0.2.5   haven_1.1.2        lattice_0.20-35    colorspace_1.3-2   stats4_3.5.1      
 [6] loo_2.0.0          yaml_2.2.0         base64enc_0.1-3    utf8_1.1.4         rlang_0.3.1       
[11] pkgbuild_1.0.2     pillar_1.3.0       glue_1.3.0         withr_2.1.2        modelr_0.1.2      
[16] readxl_1.1.0       matrixStats_0.54.0 bindr_0.1.1        plyr_1.8.4         munsell_0.5.0     
[21] gtable_0.2.0       cellranger_1.1.0   rvest_0.3.2        codetools_0.2-15   labeling_0.3      
[26] inline_0.3.15      callr_3.0.0        ps_1.2.0           parallel_3.5.1     fansi_0.4.0       
[31] broom_0.5.0        Rcpp_0.12.19       KernSmooth_2.23-15 scales_1.0.0       backports_1.1.2   
[36] jsonlite_1.5       gridExtra_2.3      digest_0.6.18      hms_0.4.2          stringi_1.2.4     
[41] processx_3.2.0     grid_3.5.1         cli_1.0.1          tools_3.5.1        magrittr_1.5      
[46] lazyeval_0.2.1     crayon_1.3.4       pkgconfig_2.0.2    xml2_1.2.0         prettyunits_1.0.2 
[51] assertthat_0.2.0   httr_1.3.1         rstudioapi_0.8     R6_2.3.0           nlme_3.1-137      
[56] compiler_3.5.1    

Is this a (known?) bug or is this change in behaviour understandable somehow?
Now, I accept that the model might be bad, but I guess adding an unused parameter shouldn’t change the model behaviour?

Best Regards
-Leif

The model with the unused variable defines an improper posterior distribution, so all bets are off.

Aha. So just by adding a variable, even though it is not used, changes the model?
I guess strictly, I should call it a parameter. And I realised I said “unused ‘model’” where I should have said “unused ‘parameter’” (I’ll update this in the original question)

It changes the posterior distribution by adding a dimension that makes the whole thing unnormalizeable (without a proper prior on the new parameter). Stan make joint proposals for all of the unknown parameters, so there is no telling what will happen. My guess is that there were other warning messages and red flags in the output.

Ok, then I understand. And you are of course completely correct, there were warnings about divergent transitions in the output.

I’m halfway through the Stan manual. Perhaps this is already explained there somewhere, if so, I apologise for taking up your time!

Many thanks for your help! :)

Best Regards
-Leif