Help with Poisson model

Let me know if there is anything that I could do to help.

At this point we’d be looking to build a minimum reproducible example, with like generated data and stuff. I’ll try a couple things in the afternoon if I get a chance, but if you wanted to look at making one yourself that’d be super helpful.

I’m not sure if this is relevant, but just in case. I started playing with the loo package to try to compare different models. I added the following code to my generated quantities block:

generated quantities {
  //vector[I * T] y_sim;
  vector[I * T] log_lik;
  vector[I * T] y_hat = log(n) + x_beta + gamma[index_time] + a[practice] + b[practice] .* time;
 // for (it in 1:I * T) {
  //  y_sim[it] = poisson_log_rng(y_hat[it]);
 // }
 for(it in 1:I*T){
    log_lik[it] = poisson_lpmf(y[it] | log(n[it]) + x_beta[it] + gamma[index_time[it]] + a[practice[it]] + b[practice[it]] .* time[it]);
 }
}

I’m not sure if this is the correct way of implementing this or not, but once I added that code this error came back:

Stan model 'stan-1b301a6a38d0' does not contain samples.

That should be on log level: poisson_log_lmpf.

2 Likes

the _log in the suffix describes the scale of the parameter, whereas the l in _lpdf indicates the return will be on the log scale.

1 Like

Thanks @andre.pfeuffer, changing to poisson_log_lmpf made that problem go away. Which I guess indicates that the problem is in poisson_log_rng

I work with @ignacio and have access to the data and can verify that this is happening. Let me know what you think! Thanks for taking a look. :)

1 Like

I had a look at this and didn’t get any unexpected behavior. Had a quick look at the C++ of the first vs. the second model and didn’t find anything weird either.

Any chance you could try this in the latest Cmdstan and see if the error exists there? If you’ve never used it before, clone it from git:

git clone --recursive https://github.com/stan-dev/cmdstan.git

cd into the directory and then make your model:

cd cmdstan
cp ../path/to/model.stan ./model.stan
make model

Notice on the make model you leave off the .stan. Run with something like:

./model sample data file=mydata.Rdata

You can generate mydata.Rdata for cmdstan from R with something like

stan_rdump(data, file = "mydata.Rdata")

Also, in Cmdstan at least, generated quantities doesn’t throw errors in warmup (https://github.com/stan-dev/stan/issues/2355#issuecomment-315166499 – I double checked this is true).

I’m struggling with stan_rdump

> stan_rdump(practices_dat, file = "mydata.Rdata")
Error in FUN(X[[i]], ...) : invalid first argument

where practices_dat is the list that I pass to the sampling function in R. That is:

fit2 <- sampling(poisson_bug, data = practices_dat,
                 chains = 4, show_messages = T,
                 cores= 4,
                 control = list(max_treedepth = 18),
                 iter = 3000,
                 warmup = 1500)

Hmm, is practices_dat a list? Check the stan_rdump doc: ?stan_rdump. I think this should work :/.

edit:

Something like:

practices_data = list(X = c(1, 2, 3), N = 3)

Yes, practices_dat is a list. I looked at ?stan_rdump and it looks like i should not use a list. It also looks like i should use .R instead of .Rdata (I tried `.Rdata and it did not work)

library(rstan)

# This does not work ------------------------------------------------------
practices_data = list(X = c(1, 2, 3), N = 3)
stan_rdump(practices_data, file = "mydata.Rdata")


# This works --------------------------------------------------------------
X = c(1, 2, 3); N = 3
stan_rdump(c('X', 'N'), file = "mydata.R")

So, I endup up running the following to create my data file for cmdstan

stan_rdump(c('I', 'T', 'y', 'n', 'K', 'x', 'practice', 'time', 'index_time'), file = "mydata.R")

Now i’m having troubles compiling the model.

$ make model
make/models:13: target `model' doesn't match the target pattern

--- Linking C++ model ---
g++ -Wall -I . -isystem stan/lib/stan_math/lib/eigen_3.3.3 -isystem stan/lib/stan_math/lib/boost_1.66.0 -isystem stan/lib/stan_math/lib/cvodes-3.1.0/include -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -I src -isystem stan/src -isystem stan/lib/stan_math/ -DFUSION_MAX_VECTOR_SIZE=12 -Wno-unused-local-typedefs -DEIGEN_NO_DEBUG -DNO_FPRINTF_OUTPUT -pipe    -O3 -o model src/cmdstan/main.cpp -include  stan/lib/stan_math/lib/cvodes-3.1.0/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/cvodes-3.1.0/lib/libsundials_cvodes.a
g++.exe: error: stan/lib/stan_math/lib/cvodes-3.1.0/lib/libsundials_cvodes.a: No such file or directory
make: *** [model] Error 1

Oh whoops, my apologies. That’s on me haha. That’s what I get for not double checking.

$ make model
make/models:13: target `model' doesn't match the target pattern

Is model.stan in the cmdstan folder? The error makes it sound like make isn’t finding the file. Like:

bbales2@frog:~/cmdstan-deletethis$ ls
bin       Jenkinsfile  make      model.stan ...

If it’s in another directory, you can do:

make ../path/to/the/model

But then the binary goes there I think.

It is:

MATHEMATICA+IMartinez@M116 MINGW64 /c/Users/IMartinez/Documents/BayesianSynthesis/deviants/bug/cmdstan (develop)
$ ls
examples/  Jenkinsfile  LICENSE  make/  makefile  model.stan  mydata.R  README.md  runCmdStanTests.py*  src/  stan/  test-all.sh*

in case is relevant, I’m running all this stuff on a Windows computer (I can do it on windows server 2008 or windows 10…)

Oh no, haha. I use Linux here, turns out the cmd is different for WIndows:

Try:

make model.exe

(from https://github.com/stan-dev/cmdstan/wiki/Getting-Started-with-CmdStan)

Very jealous…

MATHEMATICA+IMartinez@M116 MINGW64 /c/Users/IMartinez/Documents/BayesianSynthesis/deviants/bug/cmdstan (develop)
$ make model.exe
g++ -Wall -I . -isystem stan/lib/stan_math/lib/eigen_3.3.3 -isystem stan/lib/stan_math/lib/boost_1.66.0 -isystem stan/lib/stan_math/lib/cvodes-3.1.0/include -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -I src -isystem stan/src -isystem stan/lib/stan_math/ -DFUSION_MAX_VECTOR_SIZE=12 -Wno-unused-local-typedefs -DEIGEN_NO_DEBUG -DNO_FPRINTF_OUTPUT -pipe  -c -O3 -o bin/cmdstan/stanc.o src/cmdstan/stanc.cpp
cc1plus.exe: error: unrecognized command line option '-std=c++1y'
cc1plus.exe: warning: unrecognized command line option "-Wno-unused-local-typedefs" [enabled by default]
make: *** [bin/cmdstan/stanc.o] Error 1

Yegads, well, if you want to figure out how to fix that C++ there’s a very small chance that we’d see an interesting behavior :P. Long shot, but do you have the latest Rstan?

I’m honestly stumped on this. Without the specific data/model causing the problem, it’s hard to do anything other than make undirected guesses.

If it’s absolutely commenting and uncommenting this that causes the problem:

for (it in 1:I * T) {
  y_sim[it] = poisson_log_rng(y_hat[it]);
}

then I’m stumped! It doesn’t look like you’re using any fixed seeds or anything?

(The c++ error sounds like your g++ is too old, but I dunno how to upgrade that in Windows)

That is what causes the problem. I know is weird, that is why I asked @shira to check.

My hope is that next week I will be able to share some code and data to reproduce this problem. Thanks a lot for all your help @bbbales2!

In another thread I wrote Recommendations for what to do when k exceeds 0.5 in the loo package?
that I think this is a problem with initialization. I copy that comment here and add a bit more in the end.

see ?stan and option seed. Try runinng stan(...., seed=1337) with different seed values.

see ?stan and option init. The default is to randomly generate initial values between -2 and 2 on the unconstrained support. If you transform -2 and 2 from unconstrained to constrained space, are these initial values sensible?

If you are unlucky initial values are all 2, and if n=1 and x=1, then
log(n) + x*2 + 2 + exp(2)*2 + epx(2)*2*4
is about 79 which when exponentiated to Poisson parameter is about 2e34, which is a lot. You probably need to initialize log(a_std) and log(b_std) with much smaller range than [-2, 2] or scale them in Stan code by dividing them with a big number. For example, try dividing ‘a’ and ‘b’ by 100. Then maximum Poisson parameter value (with n=1 and x=1) is about 311, which should work fine.

Since initial values are randomly chosen, one chain can work, but running several can fail. Adding _rng to generated quantities changes the behavior as _rng is called between sampling iterations and it changes the state of the random number generator.

I recommend to test with different seeds, and with more specific inits or scaled a and b. For checking the effect of reasnoable inits/scaling ,you could add a print statement in Stan code which would print the value of
log(n) + x_beta + gamma[index_time] + a[practice] + b[practice] .* time

I’ve seen this same problem before (I just didn’t realize that this one is likely to be the same), and the common thing is a distribution with positive constrained parameter which is a function of positive constrained parameters so that effectively there is something like exp(a + b*exp(c)) and if b and c are both initialized to be close to 2, then that expression has very high value.

2 Likes

Here are the data and code that can reproduce this problem.

Doing this does indeed solve the problem:

transformed parameters {
  vector[I] a; 
  vector[I] b; 
  a = (sigma_a * a_std)/100 ;            // Matt trick
  b = (sigma_b * b_std)/100 ;            // Matt trick
}

Thanks everyone!

3 Likes