Help with Poisson model

Very jealous…

MATHEMATICA+IMartinez@M116 MINGW64 /c/Users/IMartinez/Documents/BayesianSynthesis/deviants/bug/cmdstan (develop)
$ make model.exe
g++ -Wall -I . -isystem stan/lib/stan_math/lib/eigen_3.3.3 -isystem stan/lib/stan_math/lib/boost_1.66.0 -isystem stan/lib/stan_math/lib/cvodes-3.1.0/include -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -I src -isystem stan/src -isystem stan/lib/stan_math/ -DFUSION_MAX_VECTOR_SIZE=12 -Wno-unused-local-typedefs -DEIGEN_NO_DEBUG -DNO_FPRINTF_OUTPUT -pipe  -c -O3 -o bin/cmdstan/stanc.o src/cmdstan/stanc.cpp
cc1plus.exe: error: unrecognized command line option '-std=c++1y'
cc1plus.exe: warning: unrecognized command line option "-Wno-unused-local-typedefs" [enabled by default]
make: *** [bin/cmdstan/stanc.o] Error 1

Yegads, well, if you want to figure out how to fix that C++ there’s a very small chance that we’d see an interesting behavior :P. Long shot, but do you have the latest Rstan?

I’m honestly stumped on this. Without the specific data/model causing the problem, it’s hard to do anything other than make undirected guesses.

If it’s absolutely commenting and uncommenting this that causes the problem:

for (it in 1:I * T) {
  y_sim[it] = poisson_log_rng(y_hat[it]);
}

then I’m stumped! It doesn’t look like you’re using any fixed seeds or anything?

(The c++ error sounds like your g++ is too old, but I dunno how to upgrade that in Windows)

That is what causes the problem. I know is weird, that is why I asked @shira to check.

My hope is that next week I will be able to share some code and data to reproduce this problem. Thanks a lot for all your help @bbbales2!

In another thread I wrote Recommendations for what to do when k exceeds 0.5 in the loo package? - #11 by avehtari
that I think this is a problem with initialization. I copy that comment here and add a bit more in the end.

see ?stan and option seed. Try runinng stan(...., seed=1337) with different seed values.

see ?stan and option init. The default is to randomly generate initial values between -2 and 2 on the unconstrained support. If you transform -2 and 2 from unconstrained to constrained space, are these initial values sensible?

If you are unlucky initial values are all 2, and if n=1 and x=1, then
log(n) + x*2 + 2 + exp(2)*2 + epx(2)*2*4
is about 79 which when exponentiated to Poisson parameter is about 2e34, which is a lot. You probably need to initialize log(a_std) and log(b_std) with much smaller range than [-2, 2] or scale them in Stan code by dividing them with a big number. For example, try dividing ‘a’ and ‘b’ by 100. Then maximum Poisson parameter value (with n=1 and x=1) is about 311, which should work fine.

Since initial values are randomly chosen, one chain can work, but running several can fail. Adding _rng to generated quantities changes the behavior as _rng is called between sampling iterations and it changes the state of the random number generator.

I recommend to test with different seeds, and with more specific inits or scaled a and b. For checking the effect of reasnoable inits/scaling ,you could add a print statement in Stan code which would print the value of
log(n) + x_beta + gamma[index_time] + a[practice] + b[practice] .* time

I’ve seen this same problem before (I just didn’t realize that this one is likely to be the same), and the common thing is a distribution with positive constrained parameter which is a function of positive constrained parameters so that effectively there is something like exp(a + b*exp(c)) and if b and c are both initialized to be close to 2, then that expression has very high value.

2 Likes

Here are the data and code that can reproduce this problem.

Doing this does indeed solve the problem:

transformed parameters {
  vector[I] a; 
  vector[I] b; 
  a = (sigma_a * a_std)/100 ;            // Matt trick
  b = (sigma_b * b_std)/100 ;            // Matt trick
}

Thanks everyone!

3 Likes