Help with Poisson model

I don’t see anything particularly strange when doing this. As I said before, max(y_hat) is not that high

y_hat <- as.data.frame(fit, pars=“y_hat”)
max(y_hat)
[1] 7.514862

Can you check warm-up iterations too?

I’m not sure how to check the warm-up iterations. I tried this:

fit <- sampling(model_poisson, data = practices_dat,
                chains = 1, show_messages = T,
                cores=1,
                control = list(max_treedepth = 18),
                iter = iter,
                warmup = warmup,
                save_warmup=TRUE)
y_hat <- as.data.frame(extract(fit, pars="y_hat", inc_warmup = TRUE))
max(y_hat)
[1] 7.505011

Hmm, that should include them. Can you post an R script to run this example? Seems like something fishy is happening…

Sadly, this is one of those datasets that I cannot share :_( . I can definitely share the code that I’m running, in fact, that already in this thread. If it is helpful, I might be able to share a file with summary(fit) but I need to ask for permission first.

In case it helps I tried using y_hat to draw from a Poisson with R instead of stan:

library(rstan)
library(bayesplot)
fit <- sampling(model_poisson, data = practices_dat,
                chains = 1, show_messages = T,
                cores=1,
                control = list(max_treedepth = 18),
                iter = iter,
                warmup = warmup,
                save_warmup=TRUE)
y_hat <- as.data.frame(fit, pars="y_hat") 
poisson_rng <- function(lambda){
  lambda <- y_hat[,lambda]
  y <- sapply(exp(lambda), rpois,n=1)
  return(y)
}

y <- sapply(colnames(y_hat), poisson_rng)
y <- as.data.frame(y)

ppc_dens_overlay(y = bsdata$y,
                 yrep = as.matrix(y[sample(x = 1:nrow(y), size = 50),]))

It just seems impossible that y_hat is always less than 7.5 when you don’t have the poisson_log_rng in there, but then when you add the rng, the number gets bigger :/. Seems like a bug in Stan, and since I was the last person to touch the _rngs I’m a little unnerved :D.

1 Like

Let me know if there is anything that I could do to help.

At this point we’d be looking to build a minimum reproducible example, with like generated data and stuff. I’ll try a couple things in the afternoon if I get a chance, but if you wanted to look at making one yourself that’d be super helpful.

I’m not sure if this is relevant, but just in case. I started playing with the loo package to try to compare different models. I added the following code to my generated quantities block:

generated quantities {
  //vector[I * T] y_sim;
  vector[I * T] log_lik;
  vector[I * T] y_hat = log(n) + x_beta + gamma[index_time] + a[practice] + b[practice] .* time;
 // for (it in 1:I * T) {
  //  y_sim[it] = poisson_log_rng(y_hat[it]);
 // }
 for(it in 1:I*T){
    log_lik[it] = poisson_lpmf(y[it] | log(n[it]) + x_beta[it] + gamma[index_time[it]] + a[practice[it]] + b[practice[it]] .* time[it]);
 }
}

I’m not sure if this is the correct way of implementing this or not, but once I added that code this error came back:

Stan model 'stan-1b301a6a38d0' does not contain samples.

That should be on log level: poisson_log_lmpf.

2 Likes

the _log in the suffix describes the scale of the parameter, whereas the l in _lpdf indicates the return will be on the log scale.

1 Like

Thanks @andre.pfeuffer, changing to poisson_log_lmpf made that problem go away. Which I guess indicates that the problem is in poisson_log_rng

I work with @ignacio and have access to the data and can verify that this is happening. Let me know what you think! Thanks for taking a look. :)

1 Like

I had a look at this and didn’t get any unexpected behavior. Had a quick look at the C++ of the first vs. the second model and didn’t find anything weird either.

Any chance you could try this in the latest Cmdstan and see if the error exists there? If you’ve never used it before, clone it from git:

git clone --recursive https://github.com/stan-dev/cmdstan.git

cd into the directory and then make your model:

cd cmdstan
cp ../path/to/model.stan ./model.stan
make model

Notice on the make model you leave off the .stan. Run with something like:

./model sample data file=mydata.Rdata

You can generate mydata.Rdata for cmdstan from R with something like

stan_rdump(data, file = "mydata.Rdata")

Also, in Cmdstan at least, generated quantities doesn’t throw errors in warmup (https://github.com/stan-dev/stan/issues/2355#issuecomment-315166499 – I double checked this is true).

I’m struggling with stan_rdump

> stan_rdump(practices_dat, file = "mydata.Rdata")
Error in FUN(X[[i]], ...) : invalid first argument

where practices_dat is the list that I pass to the sampling function in R. That is:

fit2 <- sampling(poisson_bug, data = practices_dat,
                 chains = 4, show_messages = T,
                 cores= 4,
                 control = list(max_treedepth = 18),
                 iter = 3000,
                 warmup = 1500)

Hmm, is practices_dat a list? Check the stan_rdump doc: ?stan_rdump. I think this should work :/.

edit:

Something like:

practices_data = list(X = c(1, 2, 3), N = 3)

Yes, practices_dat is a list. I looked at ?stan_rdump and it looks like i should not use a list. It also looks like i should use .R instead of .Rdata (I tried `.Rdata and it did not work)

library(rstan)

# This does not work ------------------------------------------------------
practices_data = list(X = c(1, 2, 3), N = 3)
stan_rdump(practices_data, file = "mydata.Rdata")


# This works --------------------------------------------------------------
X = c(1, 2, 3); N = 3
stan_rdump(c('X', 'N'), file = "mydata.R")

So, I endup up running the following to create my data file for cmdstan

stan_rdump(c('I', 'T', 'y', 'n', 'K', 'x', 'practice', 'time', 'index_time'), file = "mydata.R")

Now i’m having troubles compiling the model.

$ make model
make/models:13: target `model' doesn't match the target pattern

--- Linking C++ model ---
g++ -Wall -I . -isystem stan/lib/stan_math/lib/eigen_3.3.3 -isystem stan/lib/stan_math/lib/boost_1.66.0 -isystem stan/lib/stan_math/lib/cvodes-3.1.0/include -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -I src -isystem stan/src -isystem stan/lib/stan_math/ -DFUSION_MAX_VECTOR_SIZE=12 -Wno-unused-local-typedefs -DEIGEN_NO_DEBUG -DNO_FPRINTF_OUTPUT -pipe    -O3 -o model src/cmdstan/main.cpp -include  stan/lib/stan_math/lib/cvodes-3.1.0/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/cvodes-3.1.0/lib/libsundials_cvodes.a
g++.exe: error: stan/lib/stan_math/lib/cvodes-3.1.0/lib/libsundials_cvodes.a: No such file or directory
make: *** [model] Error 1

Oh whoops, my apologies. That’s on me haha. That’s what I get for not double checking.

$ make model
make/models:13: target `model' doesn't match the target pattern

Is model.stan in the cmdstan folder? The error makes it sound like make isn’t finding the file. Like:

bbales2@frog:~/cmdstan-deletethis$ ls
bin       Jenkinsfile  make      model.stan ...

If it’s in another directory, you can do:

make ../path/to/the/model

But then the binary goes there I think.

It is:

MATHEMATICA+IMartinez@M116 MINGW64 /c/Users/IMartinez/Documents/BayesianSynthesis/deviants/bug/cmdstan (develop)
$ ls
examples/  Jenkinsfile  LICENSE  make/  makefile  model.stan  mydata.R  README.md  runCmdStanTests.py*  src/  stan/  test-all.sh*

in case is relevant, I’m running all this stuff on a Windows computer (I can do it on windows server 2008 or windows 10…)

Oh no, haha. I use Linux here, turns out the cmd is different for WIndows:

Try:

make model.exe

(from https://github.com/stan-dev/cmdstan/wiki/Getting-Started-with-CmdStan)