Problems remain after non-centered parametrization of correlated parameters

How many parameters do you have total?

Thanks for asking. Is it the exact number of the original parameters that you want to know?

Perhaps, I should outline the structure of the model components. Those shown in the estimation table in my post are all the original parameters:

(i) for the component of the state-space model of a latent AR(1) process,

  • mu_alpha, beta are the intercept and slope parameters of the AR(1) process
  • mu_u1 is the process’s initial value (when the time index n = 1)
  • mu_season and sd_season are the mean and sd of the seasonality component of the process (for quarterly data)
  • sd_y is the sd of the error term of the AR(1) process

(ii) laying on top of the state-space model are several sigmoid function components (either inv_logit() or a soft-clipping function) with inputs Z*p, X*g, and G*w (where p, g, w are unknown parameters and Z, X, G are known covariate matrices)

(iii) these sigmoid functions connect the latent process (y) to the observed process (r):

r = y + Real + D,


  • Real is a function of an unknown parameter theta and a sigmoid function of Z*p;
  • D is a function of an unknown simplex vector d and the multiplication of a sigmoid function of X*g with a sigmoid function of G*w.

With the non-centered reparametrization of p and (g,w) and (mu_alpha, beta), they become transformed parameters of p_mu, p_sd, p_L, p_err, gw_mu, gw_sd, …, ab_mu, ab_sd, …, etc that were not shown in the estimation table of my post (but shown in the pairs plot).

I have been struggling with better ways to get reliable estimates of p, g, and w (it is less challenging for other parameters).

  • If the intercept p[1] of Z*p is hard-coded into the model (note: Z[n,1] is a vector of 1's), it is easier to get reliable estimates of p[2:3], which are not so strongly correlated;
  • However, if all p[1:3] are free parameters to be estimated, there is very strong correlation between p[1] and p[3] as shown in the pairs plot in my last reply.

0.95 is sufficient to avoid any divergence and get reasonable estimates of the parameters with the following setup but the downside is that it took over 7 hours (for a fast machine on a cluster) to complete fitting the model.

Number of simulated observation points N = 12,000 (120 firms x 100 quarters)
4 chains, each with iter=3000; warmup=1000; thin=1;
post-warmup draws per chain=2000, total post-warmup draws=8000.

True parameter values:
sd_y = 0.08, mu_u1 = 0.1; mu_alpha = 0.04; beta = 0.6;  
theta = 0.15;  
sd_season = 0.1, mu_season = c(-0.12, -0.06, 0.15); 
p = c(0.7, 0.62, 0.35) 
g = c(0.85, 0.35)  
w = c(0.72, 0.15, 0.62)  
d = c(0.05, 0.7, 0.25) 

               mean se_mean    sd    10%    50%    90% n_eff  Rhat
sd_y          0.080   0.000 0.001  0.079  0.080  0.080 15212 1.000
mu_u1         0.095   0.000 0.008  0.084  0.095  0.105 12897 1.000
mu_alpha      0.041   0.000 0.002  0.039  0.041  0.043  6549 1.000
beta          0.587   0.000 0.007  0.578  0.587  0.597  4486 1.000
theta         0.152   0.000 0.008  0.142  0.152  0.163  6515 1.000
sd_season     0.102   0.000 0.004  0.097  0.102  0.107 12263 1.000
mu_season[1] -0.122   0.000 0.010 -0.135 -0.122 -0.110 14662 1.000
mu_season[2] -0.054   0.000 0.009 -0.066 -0.054 -0.042 16905 1.000
mu_season[3]  0.155   0.000 0.009  0.143  0.155  0.167 14127 1.000
p[1]          0.712   0.001 0.043  0.670  0.704  0.764  2432 1.002
p[2]          0.614   0.000 0.026  0.581  0.613  0.647 11231 1.000
p[3]          0.371   0.001 0.061  0.311  0.357  0.446  2405 1.002
g[1]          0.872   0.000 0.041  0.818  0.871  0.925  7061 1.000
g[2]          0.350   0.000 0.018  0.327  0.350  0.373  6253 1.000
w[1]          0.670   0.001 0.056  0.598  0.671  0.741  5806 1.000
w[2]          0.158   0.000 0.011  0.144  0.158  0.172  8606 1.000
w[3]          0.621   0.000 0.018  0.598  0.620  0.644  9303 1.000
d[1]          0.050   0.000 0.009  0.039  0.050  0.062  9629 1.000
d[2]          0.687   0.000 0.008  0.677  0.687  0.698 16379 1.000
d[3]          0.262   0.000 0.010  0.250  0.262  0.275  9252 1.000

So it looks like this model has 20 some parameters?

Maybe try adding control(metric = “dense_e”) and see if that helps?

Also, are your lowest Neffs 2000~? You might consider just running your calculations a shorter amount of time if so.

Can you run get_num_leapfrog_per_iteration(fit) on your fit?

Thanks for suggesting the metric = "dense_e" option. Tried but it doesn’t seem to work in this case: still at the very early stage of warmup after 15 hours of fitting.

If I understand correctly, the considerations related to the metric = "dense_e" option are discussed on this page. I include the link here in case others reading this post want to consider the option.

I don’t think the fix for the dense metric has made it into the rstan on CRAN yet…


Thanks for the suggestions. Tried 1500 iterations (with 1000 warmups for 4 chains for simulated obs. N = 120 firms x 100 quarters). The lowest neff dropped to only around 340 (from previously 2000+ when running 3000 iterations with 1000 warmups for 4 chains).

Despite the large reduction in the number of iterations and the neff, the fitting time only dropped to 5.5 hours (from over 7 hours). Running get_num_leapfrog_per_iteration(fit) gives

   [1] 255 511 255 255 511 255 255 511 511 255 511 511 511 511 255 255 511 511
  [19] 511 255 511 255 255 511 255 255 255 511 255 255 255 511 255 255 511 255
  [37] 255 255 511 511 511 511 511 511 255 511 511 511 511 255 255 511 511 511
  [55] 511 255 255 255 511 511 511 511 255 511 511 255 511 511 511 511 511 511
  ... (others omitted)

Q: For a complex model like this, is a fitting time of 5.5 hours considered uncommonly long? (I am aware of the benefit of vectorization and have implemented it wherever I see possible.)

I also tried 2000 iterations (with 1000 warmups for 4 chains for simulated obs. N = 60 firms x 100 quarters). The lowest neff for the hard-to-estimate p[1] and p[3] are around 1800. get_num_leapfrog_per_iteration(fit) returns 255 255 255 ... consistently.

The fitting time for this case is a more realistic 2 hours. Previously, I focused on 120 firms because with only 60 firms, usually one of the estimates tends to be not so accurately estimated. For example, the true value of d[1] should be 0.05 but its estimate is 0.072 and the 10-90 interval does not contain the true value (see the table below).

Q: In the tradition of Bayesian analysis, is such a case with an exception or two of not so accurate estimates still considered a good enough validation of the model to justify using it to look at real data? (eg, for industries with at least 60 firms)

               mean se_mean    sd    10%    50%    90% n_eff  Rhat
sd_y          0.079   0.000 0.001  0.078  0.079  0.080  5479 1.000
mu_u1         0.099   0.000 0.011  0.085  0.099  0.113  4435 1.001
mu_alpha      0.037   0.000 0.002  0.034  0.037  0.040  3733 1.000
beta          0.599   0.000 0.010  0.586  0.599  0.613  2076 1.002
theta         0.149   0.000 0.011  0.136  0.148  0.162  2912 1.001
sd_season     0.099   0.000 0.005  0.092  0.099  0.106  3922 1.000
mu_season[1] -0.116   0.000 0.013 -0.133 -0.116 -0.099  5608 1.000
mu_season[2] -0.066   0.000 0.013 -0.083 -0.065 -0.049  6569 1.000
mu_season[3]  0.144   0.000 0.013  0.128  0.144  0.161  5244 1.000
p[1]          0.756   0.001 0.063  0.690  0.743  0.845  1807 1.004
p[2]          0.643   0.001 0.040  0.593  0.642  0.694  4613 0.999
p[3]          0.423   0.002 0.091  0.328  0.403  0.552  1756 1.003
g[1]          0.841   0.001 0.055  0.770  0.840  0.910  2883 1.001
g[2]          0.349   0.000 0.025  0.317  0.349  0.381  3179 1.000
w[1]          0.710   0.002 0.078  0.611  0.710  0.809  2514 1.000
w[2]          0.158   0.000 0.015  0.139  0.158  0.178  3523 1.000
w[3]          0.618   0.000 0.025  0.585  0.618  0.651  3438 1.002
d[1]          0.072   0.000 0.014  0.055  0.073  0.090  4029 0.999
d[2]          0.711   0.000 0.012  0.695  0.711  0.726  8837 0.999
d[3]          0.216   0.000 0.014  0.198  0.217  0.235  3623 1.000

is a fitting time of 5.5 hours considered uncommonly long?

It’s more about the number of gradient evaluations than total walltime. It looks like your chains are taking 255+ leapfrog steps before hitting a U-turn. That is a lot.

If you have a relatively modest number of parameters (is it actually just 20?) then dense adaptation might help, but like @Charles_Driver said there was a bug (and a bugfix) that may not have actually propagated to the released rstan yet. If you have the patience, you might try running this model with cmdstan.

If you have hundreds of parameters though, this probably wouldn’t work so easily. So the number is important, smaller is easier.

It sounds like the model is sampling really efficiently, it’s just taking a lot of leapfrog steps to do it. I think give cmdstan dense adaptation a try.

If you want to dump data constructed for rstan into a format that cmdstan can read (assuming you have that data in a list called ‘data’), use:

stan_rdump(names(data), file = "mydata.dat", list2env(data))
1 Like

originally 20 but after non-centered reparametrization of the correlated parameters, the total number of raw parameters to sample becomes 78.

Thanks for the guidance. Tried this but got the error message:

objects not found: r, Z, X, G
Error in as.logical(test) : cannot coerce type 'environment' to vector of type 'logical'

Confused because those objects (r, Z, X, G) are clearly in my environment. (r is a list of vectors and Z, X, G are lists of matrices, both as declared in the .stan file.)
Any idea on the error message?

Oh whoops, my bad, that last argument should be named:

stan_rdump(names(data), file = "mydata.dat", envir = list2env(data))

And this is assume that data is a list like:

data = list(N = ...,
            y = ...,

(The thing you’d pass into a Stan sampling statement).

78 parameters should be fine for this.

Thanks for the clarification. I successfully stan_rdump() the data. In installing cmdstan, I successfully completed the make build step. However, when compiling the .stan file (which I have been using with rstan without a problem), I received the errors below. In another (slower) cluster, I could complete this step and proceed to sampling without a problem.

Any idea on how to fix the compliation errors? (Are they due to the g++ version, which is g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36)?)

[sbbg070@wrds-cloud-login1-h ~/cmdstan-2.19.1]$ make ../SSM0.5.2dev

--- Translating Stan model to C++ code ---
bin/stanc  --o=../SSM0.5.2dev.hpp ../SSM0.5.2dev.stan
Model name=SSM0_5_2dev_model
Input file=../SSM0.5.2dev.stan
Output file=../SSM0.5.2dev.hpp
Info: Comments beginning with # are deprecated.  Please use // in place of # for line comments.
Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
    err_y[j] ~ normal(...)

g++ -std=c++1y -pthread -Wno-sign-compare     -O3 -I src -I stan/src -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.3.3 -I stan/lib/stan_math/lib/boost_1.69.0 -I stan/lib/stan_math/lib/sundials_4.1.0/include    -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION     -c -MT ../SSM0.5.2dev.o -MT ../SSM0.5.2dev -include ../SSM0.5.2dev.hpp -include src/cmdstan/main.cpp -MM -E -MG -MP -MF ../SSM0.5.2dev.d ../SSM0.5.2dev.hpp

--- Linking C++ model ---
g++ -std=c++1y -pthread -Wno-sign-compare     -O3 -I src -I stan/src -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.3.3 -I stan/lib/stan_math/lib/boost_1.69.0 -I stan/lib/stan_math/lib/sundials_4.1.0/include    -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION             -include ../SSM0.5.2dev.hpp src/cmdstan/main.cpp        stan/lib/stan_math/lib/sundials_4.1.0/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_4.1.0/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_4.1.0/lib/libsundials_idas.a  -o ../SSM0.5.2dev
In file included from stan/lib/stan_math/stan/math/rev/mat/fun/ordered_constrain.hpp:6:0,
                 from stan/lib/stan_math/stan/math/rev/mat.hpp:41,
                 from stan/lib/stan_math/stan/math.hpp:4,
                 from stan/src/stan/model/model_header.hpp:4,
                 from ./../SSM0.5.2dev.hpp:3,
                 from <command-line>:0:
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:31:32: error: ‘std::index_sequence’ has not been declared
                           std::index_sequence<I...> i) {
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:31:46: error: expected ‘,’ or ‘...’ before ‘<’ token
                           std::index_sequence<I...> i) {
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp: In function ‘constexpr auto stan::math::internal::apply(const F&, const Tuple&)’:
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:49:27: error: ‘make_index_sequence’ is not a member of ‘std’
   return apply_impl(f, t, std::make_index_sequence<std::tuple_size<Tuple>{}>{});
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:49:27: note: suggested alternative:
In file included from stan/lib/stan_math/lib/boost_1.69.0/boost/fusion/container/vector/vector.hpp:28:0,
                 from stan/lib/stan_math/lib/boost_1.69.0/boost/fusion/container/vector/vector10.hpp:25,
                 from stan/lib/stan_math/lib/boost_1.69.0/boost/fusion/view/transform_view/transform_view.hpp:22,
                 from stan/lib/stan_math/lib/boost_1.69.0/boost/fusion/algorithm/transformation/transform.hpp:11,
                 from stan/lib/stan_math/lib/boost_1.69.0/boost/fusion/view/zip_view/detail/begin_impl.hpp:14,
                 from stan/lib/stan_math/lib/boost_1.69.0/boost/fusion/view/zip_view/zip_view.hpp:16,
                 from stan/lib/stan_math/lib/boost_1.69.0/boost/fusion/view/zip_view.hpp:12,
                 from stan/lib/stan_math/lib/boost_1.69.0/boost/fusion/include/zip_view.hpp:11,
                 from stan/lib/stan_math/lib/boost_1.69.0/boost/numeric/odeint/util/resize.hpp:26,
                 from stan/lib/stan_math/lib/boost_1.69.0/boost/numeric/odeint/util/state_wrapper.hpp:26,
                 from stan/lib/stan_math/lib/boost_1.69.0/boost/numeric/odeint/util/ublas_wrapper.hpp:33,
                 from stan/lib/stan_math/lib/boost_1.69.0/boost/numeric/odeint.hpp:25,
                 from stan/lib/stan_math/stan/math/prim/arr/functor/integrate_ode_rk45.hpp:17,
                 from stan/lib/stan_math/stan/math/prim/arr.hpp:46,
                 from stan/lib/stan_math/stan/math/prim/mat.hpp:344,
                 from stan/lib/stan_math/stan/math/rev/mat.hpp:12,
                 from stan/lib/stan_math/stan/math.hpp:4,
                 from stan/src/stan/model/model_header.hpp:4,
                 from ./../SSM0.5.2dev.hpp:3,
                 from <command-line>:0:
stan/lib/stan_math/lib/boost_1.69.0/boost/fusion/support/detail/index_sequence.hpp:59:12: note:   ‘boost::fusion::detail::make_index_sequence’
     struct make_index_sequence
In file included from stan/lib/stan_math/stan/math/rev/mat/fun/ordered_constrain.hpp:6:0,
                 from stan/lib/stan_math/stan/math/rev/mat.hpp:41,
                 from stan/lib/stan_math/stan/math.hpp:4,
                 from stan/src/stan/model/model_header.hpp:4,
                 from ./../SSM0.5.2dev.hpp:3,
                 from <command-line>:0:
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:49:77: error: expected primary-expression before ‘{’ token
   return apply_impl(f, t, std::make_index_sequence<std::tuple_size<Tuple>{}>{});
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:49:77: error: expected ‘)’ before ‘{’ token
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp: At global scope:
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:151:9: error: expected type-specifier
       = std::result_of_t<F(decltype(is_var_), decltype(value_of(Targs()))...)>;
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:156:42: error: ‘FReturnType’ was not declared in this scope
   std::array<int, internal::compute_dims<FReturnType>::value> M_;
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:156:53: error: template argument 1 is invalid
   std::array<int, internal::compute_dims<FReturnType>::value> M_;
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:156:61: error: template argument 2 is invalid
   std::array<int, internal::compute_dims<FReturnType>::value> M_;
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp: In member function ‘std::vector<stan::math::var> stan::math::adj_jac_vari<F, Targs>::build_return_varis_and_vars(const std::vector<double>&)’:
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:349:9: error: invalid types ‘int[int]’ for array subscript
     M_[0] = val_y.size();
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:350:32: error: invalid types ‘int[int]’ for array subscript
     std::vector<var> var_y(M_[0]);
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp: In member function ‘Eigen::Matrix<stan::math::var, R, C> stan::math::adj_jac_vari<F, Targs>::build_return_varis_and_vars(const Eigen::Matrix<double, R, C>&)’:
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:375:9: error: invalid types ‘int[int]’ for array subscript
     M_[0] = val_y.rows();
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:376:9: error: invalid types ‘int[int]’ for array subscript
     M_[1] = val_y.cols();
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:377:40: error: invalid types ‘int[int]’ for array subscript
     Eigen::Matrix<var, R, C> var_y(M_[0], M_[1]);
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:377:47: error: invalid types ‘int[int]’ for array subscript
     Eigen::Matrix<var, R, C> var_y(M_[0], M_[1]);
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp: In member function ‘void stan::math::adj_jac_vari<F, Targs>::chain( ’:
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:530:5: error: ‘FReturnType’ was not declared in this scope
     FReturnType y_adj;
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:530:17: error: expected ‘;’ before ‘y_adj’
     FReturnType y_adj;
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:532:38: error: ‘y_adj’ was not declared in this scope
     internal::build_y_adj(y_vi_, M_, y_adj);
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:536:26: error: expansion pattern ‘auto&&’ contains no argument packs
         [this](auto&&... args) { this->accumulate_adjoints(args...); },
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp: In lambda function:
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:536:60: error: ‘args’ was not declared in this scope
         [this](auto&&... args) { this->accumulate_adjoints(args...); },
make: *** [../SSM0.5.2dev] Error 1

Also, I followed p.28 of the cmdstan-guide-2.19.1.pdf to hope to sample 4 chains in parallel but the chains ran sequentially even when multiple cores are specified.

Any suggestion or pointer to the correct commands for a cluster environment?

#PBS -l nodes=1:ppn=8

for i in {1..4}
../SSM0.5.2dev sample algorithm=hmc metric=dense_e adapt delta=0.95 \
id=$i data file=../SSM_data.dat output file=samples$i.csv 

Did you get things to compile? I’m guessing this is the compiler. I think Stan supports gcc 4.9.3 (whatever Rtools is at). std::index_sequence is a c++14 thing (

To do four chains in parallel, you gotta run four separate things. If I remember correct qsub scripts just launch the same script on however many different nodes you request, so maybe just do something like:


../SSM0.5.2dev sample algorithm=hmc metric=dense_e adapt delta=0.95 \
id=$PBS_ARRAYID data file=../SSM_data.dat output file=samples$PBS_ARRAYID.csv

And then however many nodes you request via qsub (or cores or however it’s denoted), you’ll get that many outputs. Use the examples/bernoulli job to test this.

1 Like

Hi, Ben:

I finally completed a run of 4 chains using metric=dense_e (to compare with that using metric=diag_e). It does reduce the run time significantly (with essentially the same estimates given the large amount of simulated data used: 12,000 data points).
Here are the summary output: dense_e_cmdStan0.5.2dev_2.6hrs.txt (5.2 KB) diag_e_cmdStan0.5.2dev_9.6hrs.txt (5.2 KB)

Would appreciate very much your feedback on whether there’s anything that I should pay attention to, especially about the diagnostic indicators. Thanks.

1 Like

Well the things to watch out for are the usual things: are there divergences in the dense_e version and are the Rhats 1?

Your N_effs still seem really high. You might be able to get away with cutting your chain lengths in half or something. Maybe try 500 warmup/500 samples and see what happens? I wouldn’t go much less than 500 though. Each chain needs to be long enough to do warmup and get enough N_eff individually to make the Rhats and such reliable.

1 Like

Would appreciate your clarification of the following issues that seem confusing.

In the documentation (eg, p.29 of cmdstan-guide-2.19.1.pdf), the reported divergent__ row has the NEff equal to the 1000 iternations saved (but in my case below, I have -nan) and also Rhat = nan (my case: 9.99e-01). Are these NEff and Rhat, in either case, meaningful at all?

The Mean in my case is non-zero but very small (5.00e-03). Is this merely due to trying to report an integer value in floating point? (In other words, that reported value should be taken as 0.)

Inference for Stan model: SSM0_5_2dev_model
4 chains: each with iter=(400,400,400,400); warmup=(0,0,0,0); thin=(1,1,1,1); 1600 iterations saved.
Warmup took (7133, 7125, 7429, 6727) seconds, 7.89 hours total
Sampling took (3443, 3584, 3725, 3661) seconds, 4.00 hours total
                         Mean      MCSE    StdDev         5%        50%        95%     N_Eff   N_Eff/s     R_hat
lp__                 2.07e+04  5.76e-01  1.39e+01   2.07e+04   2.07e+04   2.07e+04  5.82e+02  4.04e-02  1.00e+00
accept_stat__        9.62e-01  2.75e-03  8.83e-02   8.48e-01   9.88e-01   9.99e-01  1.03e+03  7.16e-02  1.01e+00
stepsize__           1.79e-02  6.50e-04  9.20e-04   1.69e-02   1.79e-02   1.94e-02  2.00e+00  1.39e-04  5.60e+13
treedepth__          9.00e+00  5.04e-03  1.98e-01   9.00e+00   9.00e+00   9.00e+00  1.55e+03  1.08e-01  1.00e+00
n_leapfrog__         5.46e+02  3.59e+00  1.34e+02   5.11e+02   5.11e+02   1.02e+03  1.39e+03  9.63e-02  1.02e+00
divergent__          5.00e-03      -nan  7.06e-02   0.00e+00   0.00e+00   0.00e+00      -nan      -nan  9.99e-01
energy__            -2.05e+04  8.28e-01  1.91e+01  -2.06e+04  -2.05e+04  -2.05e+04  5.30e+02  3.68e-02  1.01e+00
sd_y                 8.01e-02  1.35e-05  5.74e-04   7.91e-02   8.01e-02   8.10e-02  1.80e+03  1.25e-01  9.98e-01
mu_u1                1.02e-01  1.99e-04  8.99e-03   8.65e-02   1.02e-01   1.16e-01  2.03e+03  1.41e-01  1.00e+00
mu_alpha             4.17e-02  4.61e-05  1.86e-03   3.87e-02   4.18e-02   4.47e-02  1.63e+03  1.13e-01  9.99e-01
beta                 5.93e-01  1.87e-04  7.52e-03   5.80e-01   5.93e-01   6.05e-01  1.62e+03  1.12e-01  1.00e+00
theta                1.53e-01  7.81e-05  3.24e-03   1.48e-01   1.53e-01   1.59e-01  1.71e+03  1.19e-01  1.00e+00
sd_season            1.02e-01  1.03e-04  4.25e-03   9.46e-02   1.01e-01   1.09e-01  1.70e+03  1.18e-01  1.00e+00
mu_season[1]        -1.26e-01  2.23e-04  1.05e-02  -1.44e-01  -1.26e-01  -1.09e-01  2.20e+03  1.53e-01  1.00e+00
mu_season[2]        -7.06e-02  2.23e-04  1.02e-02  -8.78e-02  -7.04e-02  -5.45e-02  2.07e+03  1.44e-01  1.00e+00
mu_season[3]         1.46e-01  2.42e-04  1.03e-02   1.28e-01   1.46e-01   1.63e-01  1.82e+03  1.27e-01  1.00e+00
p[1]                 6.45e-01  1.07e-03  3.83e-02   5.98e-01   6.38e-01   7.19e-01  1.27e+03  8.81e-02  1.00e+00
p[2]                 6.27e-01  1.20e-04  5.35e-03   6.18e-01   6.27e-01   6.36e-01  1.98e+03  1.37e-01  9.99e-01
p[3]                 5.97e-01  1.54e-03  5.48e-02   5.30e-01   5.87e-01   7.04e-01  1.27e+03  8.79e-02  1.00e+00
g[1]                 8.36e-01  1.09e-03  4.36e-02   7.65e-01   8.36e-01   9.10e-01  1.62e+03  1.12e-01  1.00e+00
g[2]                 3.37e-01  5.20e-04  2.02e-02   3.05e-01   3.37e-01   3.71e-01  1.51e+03  1.05e-01  1.00e+00
w[1]                 7.33e-01  1.53e-03  6.31e-02   6.24e-01   7.34e-01   8.35e-01  1.70e+03  1.18e-01  9.99e-01
w[2]                 1.41e-01  3.11e-04  1.26e-02   1.20e-01   1.41e-01   1.62e-01  1.66e+03  1.15e-01  9.99e-01
w[3]                 5.94e-01  4.87e-04  1.95e-02   5.61e-01   5.94e-01   6.25e-01  1.60e+03  1.11e-01  9.99e-01
d[1]                 6.68e-02  2.74e-04  1.07e-02   4.91e-02   6.69e-02   8.42e-02  1.54e+03  1.07e-01  1.00e+00
d[2]                 6.98e-01  2.17e-04  9.15e-03   6.83e-01   6.98e-01   7.13e-01  1.78e+03  1.23e-01  9.99e-01
d[3]                 2.35e-01  2.62e-04  1.10e-02   2.17e-01   2.35e-01   2.53e-01  1.78e+03  1.23e-01  1.00e+00

Don’t worry about Rhat for the divergent__ parameter, that’s for the parameters mostly.

5e-3 is the fraction of post-warmup trajectories that were divergent in all chains, so there were 5e-3 * 1600 = 8 divergences across all four chains.

This model looks like it took way longer than the previous one to run? Nearly 12 hours four 1600 draws vs. 7 hours for 4000 draws? And the treedepth is way larger and stepsize way smaller. Was this with more data or something?

Was the adapt_delta higher in this run? I guess if it has to be higher to keep divergences down then there hasn’t been any real advantage to switching to the dense adaptation.

This is the clearest document on what divergences can mean: . In that case the divergences are happening in 2% of trajectories and they do not go away with an adapt_delta closer to 1.

The warning divergences give is that you might not be sampling the distribution you think you are. I guess since you have results from the diagonal adaptation which isn’t producing divergences, you could make predictions with each of them with example data and see if there are differences.

If you can check your results against the model run with diagonal adaptation. Rhat occasionally ends up less than 1.0. I don’t think what you’re seeing in your results here it’s a thing to worry about.

I have been assuming that the 1.01 threshold for Rhat you mentioned before is equivalent to a 0.99 threshold. So only the deviation from 1 matters, right?

Thanks for the clarification and additional information. I need to look into the issues involved more carefully and systematically. Let me simply reply to your question about the slow run time: I suspect that my changing the p[3]'s true parameter value resulted in divergences, which typically would lead to a much slower run time.

The p[3] of my model has been hard to estimate reliably – usually overestimating the true value (0.35) to 0.43 (but other than that, the run time then seemed all right and I didn’t have divergences). I wanted to understand to what extent the hard-to-estimate p[3] is driven by my prior (with a mean at 0.5). So in the recent runs, I set the true value of p[3] to 0.68 (above 0.5) to see whether the estimate would fall somehwere in between, which I believe would suggest the p[3] cannot be well identified by the data and thus the estimate is always pulled toward the prior’s mean (0.5). p[1:3] are the parameters of a sigmoid function modeled with inv_logit(). I guess the nonlinearity might make it hard to be accurately estimated.

I think Rhat is supposed to be greater than or equal to 1.0. I think it being less than one just happens randomly, but isn’t a big deal. So less than or greater than one are interpreted differently.

Only suggestion I have is maybe generate a dataset that doesn’t take 12 hours to run haha. It’s gonna be hard to make much headway with calculations that slow.

1 Like

Thanks for your clarification and suggestion. Make a lot of sense. Owing to some other considerations, for now I still use 10,000 simulated data points.

I am returning to more conservative choices of warmup (1000) and delta (0.99) to first ensure no divergence. Just finished a comparison of the dense vs diag runs on the slower cluster I have access to. I believe the results confirm that it’s fine (ie, similar results) and benefitial (ie, shorter run time) to use the dense metric.

Would appreciate your clarification of one thing though. In the diag results below, the rows from stepsize to n_leapfrog have MCSE and N_Eff reported as -nan, which is different from those reported in the dense results. Is this difference something to worry about by me as a model builder?

diag results:

4 chains: each with iter=(350,350,350,350); warmup=(0,0,0,0); thin=(1,1,1,1); 1400 iterations saved.
Warmup took (30658, 31333, 31104, 31588) seconds, 34.6 hours total
Sampling took (11519, 11515, 11542, 11483) seconds, 12.8 hours total
                         Mean      MCSE    StdDev         5%        50%        95%     N_Eff   N_Eff/s     R_hat
lp__                 2.07e+04  6.17e-01  1.41e+01   2.07e+04   2.07e+04   2.07e+04  5.22e+02  1.13e-02  1.01e+00
accept_stat__        9.92e-01  1.43e-03  1.31e-02   9.66e-01   9.98e-01   1.00e+00  8.43e+01  1.83e-03  1.03e+00
stepsize__           2.05e-03      -nan  5.46e-04   1.41e-03   2.11e-03   2.90e-03      -nan      -nan  4.32e+14
treedepth__          1.00e+01      -nan  3.02e-14   1.00e+01   1.00e+01   1.00e+01      -nan      -nan  9.97e-01
n_leapfrog__         1.02e+03      -nan  8.98e-12   1.02e+03   1.02e+03   1.02e+03      -nan      -nan  9.97e-01
divergent__          0.00e+00      -nan  0.00e+00   0.00e+00   0.00e+00   0.00e+00      -nan      -nan      -nan
energy__            -2.05e+04  8.54e-01  1.95e+01  -2.06e+04  -2.05e+04  -2.05e+04  5.20e+02  1.13e-02  1.01e+00
sd_y                 8.01e-02  2.65e-05  5.77e-04   7.92e-02   8.01e-02   8.10e-02  4.74e+02  1.03e-02  1.00e+00
mu_u1                1.01e-01  5.45e-04  8.94e-03   8.58e-02   1.01e-01   1.15e-01  2.69e+02  5.85e-03  1.01e+00
mu_alpha             4.16e-02  1.18e-04  1.85e-03   3.87e-02   4.16e-02   4.47e-02  2.47e+02  5.36e-03  1.01e+00
beta                 5.93e-01  7.16e-04  7.70e-03   5.80e-01   5.93e-01   6.06e-01  1.16e+02  2.51e-03  1.03e+00
theta                1.53e-01  3.12e-04  3.36e-03   1.48e-01   1.53e-01   1.58e-01  1.16e+02  2.53e-03  1.03e+00
sd_season            1.02e-01  2.36e-04  4.21e-03   9.44e-02   1.02e-01   1.09e-01  3.19e+02  6.92e-03  1.02e+00
mu_season[1]        -1.27e-01  3.80e-04  9.74e-03  -1.43e-01  -1.27e-01  -1.10e-01  6.56e+02  1.42e-02  1.00e+00
mu_season[2]        -6.99e-02  5.43e-04  1.01e-02  -8.60e-02  -6.97e-02  -5.31e-02  3.47e+02  7.54e-03  1.01e+00
mu_season[3]         1.45e-01  5.61e-04  1.00e-02   1.29e-01   1.45e-01   1.62e-01  3.20e+02  6.95e-03  1.00e+00
p[1]                 6.43e-01  3.36e-03  3.69e-02   5.98e-01   6.36e-01   7.09e-01  1.21e+02  2.62e-03  1.03e+00
p[2]                 6.27e-01  2.00e-04  5.64e-03   6.18e-01   6.27e-01   6.36e-01  7.94e+02  1.72e-02  1.00e+00
p[3]                 5.95e-01  4.78e-03  5.26e-02   5.31e-01   5.85e-01   6.89e-01  1.21e+02  2.64e-03  1.03e+00
g[1]                 8.39e-01  2.58e-03  4.22e-02   7.72e-01   8.39e-01   9.09e-01  2.68e+02  5.82e-03  1.02e+00
g[2]                 3.37e-01  1.26e-03  1.95e-02   3.04e-01   3.37e-01   3.68e-01  2.39e+02  5.18e-03  1.02e+00
w[1]                 7.30e-01  4.82e-03  6.03e-02   6.25e-01   7.31e-01   8.23e-01  1.57e+02  3.40e-03  1.04e+00
w[2]                 1.41e-01  7.64e-04  1.24e-02   1.22e-01   1.41e-01   1.62e-01  2.62e+02  5.69e-03  1.02e+00
w[3]                 5.92e-01  9.68e-04  1.88e-02   5.61e-01   5.92e-01   6.22e-01  3.77e+02  8.18e-03  1.00e+00
d[1]                 6.71e-02  6.41e-04  1.04e-02   4.97e-02   6.72e-02   8.40e-02  2.65e+02  5.74e-03  1.00e+00
d[2]                 6.98e-01  4.35e-04  9.29e-03   6.82e-01   6.98e-01   7.14e-01  4.57e+02  9.91e-03  1.00e+00
d[3]                 2.35e-01  7.15e-04  1.10e-02   2.16e-01   2.36e-01   2.53e-01  2.38e+02  5.18e-03  1.01e+00

dense results:

4 chains: each with iter=(350,350,350,350); warmup=(0,0,0,0); thin=(1,1,1,1); 1400 iterations saved.
Warmup took (21000, 20885, 20857, 22902) seconds, 23.8 hours total
Sampling took (7961, 4248, 7934, 8000) seconds, 7.82 hours total
                         Mean      MCSE    StdDev         5%        50%        95%     N_Eff   N_Eff/s     R_hat
lp__                 2.07e+04  6.02e-01  1.34e+01   2.07e+04   2.07e+04   2.07e+04  4.94e+02  1.75e-02  1.01e+00
accept_stat__        9.90e-01  5.44e-04  2.08e-02   9.57e-01   9.96e-01   1.00e+00  1.47e+03  5.21e-02  1.01e+00
stepsize__           1.05e-02  1.52e-03  2.15e-03   8.80e-03   9.84e-03   1.42e-02  2.00e+00  7.12e-05  5.08e+14
treedepth__          9.73e+00  2.98e-01  4.46e-01   9.00e+00   1.00e+01   1.00e+01  2.23e+00  7.92e-05  3.00e+00
n_leapfrog__         8.98e+02  1.46e+02  2.20e+02   5.11e+02   1.02e+03   1.02e+03  2.27e+00  8.07e-05  2.79e+00
divergent__          0.00e+00      -nan  0.00e+00   0.00e+00   0.00e+00   0.00e+00      -nan      -nan      -nan
energy__            -2.05e+04  8.71e-01  1.91e+01  -2.06e+04  -2.05e+04  -2.05e+04  4.81e+02  1.71e-02  1.01e+00
sd_y                 8.01e-02  1.41e-05  5.67e-04   7.91e-02   8.01e-02   8.10e-02  1.61e+03  5.73e-02  9.99e-01
mu_u1                1.02e-01  2.08e-04  8.64e-03   8.75e-02   1.02e-01   1.16e-01  1.73e+03  6.13e-02  9.99e-01
mu_alpha             4.17e-02  4.73e-05  1.87e-03   3.86e-02   4.16e-02   4.48e-02  1.57e+03  5.57e-02  1.00e+00
beta                 5.92e-01  1.79e-04  7.24e-03   5.80e-01   5.92e-01   6.04e-01  1.63e+03  5.79e-02  9.98e-01
theta                1.53e-01  7.53e-05  3.14e-03   1.48e-01   1.53e-01   1.59e-01  1.73e+03  6.15e-02  9.99e-01
sd_season            1.01e-01  9.44e-05  4.18e-03   9.49e-02   1.01e-01   1.09e-01  1.96e+03  6.97e-02  9.99e-01
mu_season[1]        -1.26e-01  2.23e-04  1.01e-02  -1.43e-01  -1.26e-01  -1.09e-01  2.05e+03  7.30e-02  9.99e-01
mu_season[2]        -7.02e-02  2.50e-04  1.02e-02  -8.74e-02  -7.03e-02  -5.30e-02  1.66e+03  5.90e-02  9.99e-01
mu_season[3]         1.46e-01  2.48e-04  1.01e-02   1.29e-01   1.46e-01   1.63e-01  1.66e+03  5.92e-02  1.00e+00
p[1]                 6.45e-01  1.11e-03  3.86e-02   5.98e-01   6.38e-01   7.19e-01  1.21e+03  4.31e-02  9.99e-01
p[2]                 6.27e-01  1.23e-04  5.46e-03   6.18e-01   6.27e-01   6.36e-01  1.97e+03  7.01e-02  1.00e+00
p[3]                 5.97e-01  1.58e-03  5.52e-02   5.30e-01   5.88e-01   7.05e-01  1.22e+03  4.32e-02  9.98e-01
g[1]                 8.36e-01  1.12e-03  4.61e-02   7.60e-01   8.36e-01   9.10e-01  1.71e+03  6.08e-02  9.99e-01
g[2]                 3.38e-01  5.04e-04  1.99e-02   3.05e-01   3.37e-01   3.71e-01  1.56e+03  5.54e-02  1.00e+00
w[1]                 7.34e-01  1.63e-03  6.55e-02   6.27e-01   7.36e-01   8.37e-01  1.61e+03  5.72e-02  1.00e+00
w[2]                 1.41e-01  2.97e-04  1.31e-02   1.20e-01   1.41e-01   1.62e-01  1.94e+03  6.89e-02  1.00e+00
w[3]                 5.94e-01  4.72e-04  1.98e-02   5.63e-01   5.93e-01   6.28e-01  1.76e+03  6.26e-02  9.98e-01
d[1]                 6.73e-02  2.60e-04  1.03e-02   5.01e-02   6.75e-02   8.37e-02  1.57e+03  5.57e-02  1.00e+00
d[2]                 6.98e-01  2.36e-04  9.39e-03   6.83e-01   6.98e-01   7.14e-01  1.58e+03  5.61e-02  1.00e+00
d[3]                 2.35e-01  2.69e-04  1.09e-02   2.17e-01   2.35e-01   2.53e-01  1.64e+03  5.82e-02  1.00e+00