How many parameters do you have total?
Problems remain after noncentered parametrization of correlated parameters
Thanks for asking. Is it the exact number of the original parameters that you want to know?
Perhaps, I should outline the structure of the model components. Those shown in the estimation table in my post are all the original parameters:
(i) for the component of the statespace model of a latent AR(1) process,

mu_alpha
,beta
are the intercept and slope parameters of the AR(1) process 
mu_u1
is the process’s initial value (when the time index n = 1) 
mu_season
andsd_season
are the mean and sd of the seasonality component of the process (for quarterly data) 
sd_y
is the sd of the error term of the AR(1) process
(ii) laying on top of the statespace model are several sigmoid function components (either inv_logit()
or a softclipping function) with inputs Z*p
, X*g
, and G*w
(where p
, g
, w
are unknown parameters and Z
, X
, G
are known covariate matrices)
(iii) these sigmoid functions connect the latent process (y
) to the observed process (r
):
r = y + Real + D
,
where

Real
is a function of an unknown parametertheta
and a sigmoid function ofZ*p
; 
D
is a function of an unknown simplex vectord
and the multiplication of a sigmoid function ofX*g
with a sigmoid function ofG*w
.
With the noncentered reparametrization of p
and (g
,w
) and (mu_alpha
, beta
), they become transformed parameters of p_mu
, p_sd
, p_L
, p_err
, gw_mu
, gw_sd
, …, ab_mu
, ab_sd
, …, etc that were not shown in the estimation table of my post (but shown in the pairs plot).
I have been struggling with better ways to get reliable estimates of p
, g
, and w
(it is less challenging for other parameters).
 If the intercept
p[1]
ofZ*p
is hardcoded into the model (note:Z[n,1]
is a vector of1
's), it is easier to get reliable estimates ofp[2:3]
, which are not so strongly correlated;  However, if all
p[1:3]
are free parameters to be estimated, there is very strong correlation betweenp[1]
andp[3]
as shown in the pairs plot in my last reply.
0.95 is sufficient to avoid any divergence and get reasonable estimates of the parameters with the following setup but the downside is that it took over 7 hours (for a fast machine on a cluster) to complete fitting the model.
Number of simulated observation points N = 12,000 (120 firms x 100 quarters)
4 chains, each with iter=3000; warmup=1000; thin=1;
postwarmup draws per chain=2000, total postwarmup draws=8000.
True parameter values:
====================================================
sd_y = 0.08, mu_u1 = 0.1; mu_alpha = 0.04; beta = 0.6;
theta = 0.15;
sd_season = 0.1, mu_season = c(0.12, 0.06, 0.15);
p = c(0.7, 0.62, 0.35)
g = c(0.85, 0.35)
w = c(0.72, 0.15, 0.62)
d = c(0.05, 0.7, 0.25)
====================================================
mean se_mean sd 10% 50% 90% n_eff Rhat
sd_y 0.080 0.000 0.001 0.079 0.080 0.080 15212 1.000
mu_u1 0.095 0.000 0.008 0.084 0.095 0.105 12897 1.000
mu_alpha 0.041 0.000 0.002 0.039 0.041 0.043 6549 1.000
beta 0.587 0.000 0.007 0.578 0.587 0.597 4486 1.000
theta 0.152 0.000 0.008 0.142 0.152 0.163 6515 1.000
sd_season 0.102 0.000 0.004 0.097 0.102 0.107 12263 1.000
mu_season[1] 0.122 0.000 0.010 0.135 0.122 0.110 14662 1.000
mu_season[2] 0.054 0.000 0.009 0.066 0.054 0.042 16905 1.000
mu_season[3] 0.155 0.000 0.009 0.143 0.155 0.167 14127 1.000
p[1] 0.712 0.001 0.043 0.670 0.704 0.764 2432 1.002
p[2] 0.614 0.000 0.026 0.581 0.613 0.647 11231 1.000
p[3] 0.371 0.001 0.061 0.311 0.357 0.446 2405 1.002
g[1] 0.872 0.000 0.041 0.818 0.871 0.925 7061 1.000
g[2] 0.350 0.000 0.018 0.327 0.350 0.373 6253 1.000
w[1] 0.670 0.001 0.056 0.598 0.671 0.741 5806 1.000
w[2] 0.158 0.000 0.011 0.144 0.158 0.172 8606 1.000
w[3] 0.621 0.000 0.018 0.598 0.620 0.644 9303 1.000
d[1] 0.050 0.000 0.009 0.039 0.050 0.062 9629 1.000
d[2] 0.687 0.000 0.008 0.677 0.687 0.698 16379 1.000
d[3] 0.262 0.000 0.010 0.250 0.262 0.275 9252 1.000
So it looks like this model has 20 some parameters?
Maybe try adding control(metric = “dense_e”) and see if that helps?
Also, are your lowest Neffs 2000~? You might consider just running your calculations a shorter amount of time if so.
Can you run get_num_leapfrog_per_iteration(fit)
on your fit?
Thanks for suggesting the metric = "dense_e"
option. Tried but it doesn’t seem to work in this case: still at the very early stage of warmup after 15 hours of fitting.
If I understand correctly, the considerations related to the metric = "dense_e"
option are discussed on this page. I include the link here in case others reading this post want to consider the option.
I don’t think the fix for the dense metric has made it into the rstan on CRAN yet…
Thanks for the suggestions. Tried 1500 iterations (with 1000 warmups for 4 chains for simulated obs. N = 120 firms x 100 quarters). The lowest neff
dropped to only around 340 (from previously 2000+ when running 3000 iterations with 1000 warmups for 4 chains).
Despite the large reduction in the number of iterations and the neff
, the fitting time only dropped to 5.5 hours (from over 7 hours). Running get_num_leapfrog_per_iteration(fit)
gives
[1] 255 511 255 255 511 255 255 511 511 255 511 511 511 511 255 255 511 511
[19] 511 255 511 255 255 511 255 255 255 511 255 255 255 511 255 255 511 255
[37] 255 255 511 511 511 511 511 511 255 511 511 511 511 255 255 511 511 511
[55] 511 255 255 255 511 511 511 511 255 511 511 255 511 511 511 511 511 511
... (others omitted)
Q: For a complex model like this, is a fitting time of 5.5 hours considered uncommonly long? (I am aware of the benefit of vectorization and have implemented it wherever I see possible.)
I also tried 2000 iterations (with 1000 warmups for 4 chains for simulated obs. N = 60 firms x 100 quarters). The lowest neff
for the hardtoestimate p[1]
and p[3]
are around 1800. get_num_leapfrog_per_iteration(fit)
returns 255 255 255 ...
consistently.
The fitting time for this case is a more realistic 2 hours. Previously, I focused on 120 firms because with only 60 firms, usually one of the estimates tends to be not so accurately estimated. For example, the true value of d[1]
should be 0.05 but its estimate is 0.072 and the 1090 interval does not contain the true value (see the table below).
Q: In the tradition of Bayesian analysis, is such a case with an exception or two of not so accurate estimates still considered a good enough validation of the model to justify using it to look at real data? (eg, for industries with at least 60 firms)
mean se_mean sd 10% 50% 90% n_eff Rhat
sd_y 0.079 0.000 0.001 0.078 0.079 0.080 5479 1.000
mu_u1 0.099 0.000 0.011 0.085 0.099 0.113 4435 1.001
mu_alpha 0.037 0.000 0.002 0.034 0.037 0.040 3733 1.000
beta 0.599 0.000 0.010 0.586 0.599 0.613 2076 1.002
theta 0.149 0.000 0.011 0.136 0.148 0.162 2912 1.001
sd_season 0.099 0.000 0.005 0.092 0.099 0.106 3922 1.000
mu_season[1] 0.116 0.000 0.013 0.133 0.116 0.099 5608 1.000
mu_season[2] 0.066 0.000 0.013 0.083 0.065 0.049 6569 1.000
mu_season[3] 0.144 0.000 0.013 0.128 0.144 0.161 5244 1.000
p[1] 0.756 0.001 0.063 0.690 0.743 0.845 1807 1.004
p[2] 0.643 0.001 0.040 0.593 0.642 0.694 4613 0.999
p[3] 0.423 0.002 0.091 0.328 0.403 0.552 1756 1.003
g[1] 0.841 0.001 0.055 0.770 0.840 0.910 2883 1.001
g[2] 0.349 0.000 0.025 0.317 0.349 0.381 3179 1.000
w[1] 0.710 0.002 0.078 0.611 0.710 0.809 2514 1.000
w[2] 0.158 0.000 0.015 0.139 0.158 0.178 3523 1.000
w[3] 0.618 0.000 0.025 0.585 0.618 0.651 3438 1.002
d[1] 0.072 0.000 0.014 0.055 0.073 0.090 4029 0.999
d[2] 0.711 0.000 0.012 0.695 0.711 0.726 8837 0.999
d[3] 0.216 0.000 0.014 0.198 0.217 0.235 3623 1.000
is a fitting time of 5.5 hours considered uncommonly long?
It’s more about the number of gradient evaluations than total walltime. It looks like your chains are taking 255+ leapfrog steps before hitting a Uturn. That is a lot.
If you have a relatively modest number of parameters (is it actually just 20?) then dense adaptation might help, but like @Charles_Driver said there was a bug (and a bugfix) that may not have actually propagated to the released rstan yet. If you have the patience, you might try running this model with cmdstan.
If you have hundreds of parameters though, this probably wouldn’t work so easily. So the number is important, smaller is easier.
It sounds like the model is sampling really efficiently, it’s just taking a lot of leapfrog steps to do it. I think give cmdstan dense adaptation a try.
If you want to dump data constructed for rstan into a format that cmdstan can read (assuming you have that data in a list called ‘data’), use:
stan_rdump(names(data), file = "mydata.dat", list2env(data))
originally 20 but after noncentered reparametrization of the correlated parameters, the total number of raw parameters to sample becomes 78.
Thanks for the guidance. Tried this but got the error message:
objects not found: r, Z, X, G
Error in as.logical(test) : cannot coerce type 'environment' to vector of type 'logical'
Confused because those objects (r, Z, X, G
) are clearly in my environment. (r
is a list of vectors and Z, X, G
are lists of matrices, both as declared in the .stan
file.)
Any idea on the error message?
Oh whoops, my bad, that last argument should be named:
stan_rdump(names(data), file = "mydata.dat", envir = list2env(data))
And this is assume that data is a list like:
data = list(N = ...,
y = ...,
...)
(The thing you’d pass into a Stan sampling statement).
78 parameters should be fine for this.
Thanks for the clarification. I successfully stan_rdump()
the data. In installing cmdstan
, I successfully completed the make build
step. However, when compiling the .stan file (which I have been using with rstan without a problem), I received the errors below. In another (slower) cluster, I could complete this step and proceed to sampling without a problem.
Any idea on how to fix the compliation errors? (Are they due to the g++ version, which is g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.536)?)
[sbbg070@wrdscloudlogin1h ~/cmdstan2.19.1]$ make ../SSM0.5.2dev
 Translating Stan model to C++ code 
bin/stanc o=../SSM0.5.2dev.hpp ../SSM0.5.2dev.stan
Model name=SSM0_5_2dev_model
Input file=../SSM0.5.2dev.stan
Output file=../SSM0.5.2dev.hpp
DIAGNOSTIC(S) FROM PARSER:
Info: Comments beginning with # are deprecated. Please use // in place of # for line comments.
Info:
Lefthand side of sampling statement (~) may contain a nonlinear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Lefthandside of sampling statement:
err_y[j] ~ normal(...)
g++ std=c++1y pthread Wnosigncompare O3 I src I stan/src I stan/lib/stan_math/ I stan/lib/stan_math/lib/eigen_3.3.3 I stan/lib/stan_math/lib/boost_1.69.0 I stan/lib/stan_math/lib/sundials_4.1.0/include DBOOST_RESULT_OF_USE_TR1 DBOOST_NO_DECLTYPE DBOOST_DISABLE_ASSERTS DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION c MT ../SSM0.5.2dev.o MT ../SSM0.5.2dev include ../SSM0.5.2dev.hpp include src/cmdstan/main.cpp MM E MG MP MF ../SSM0.5.2dev.d ../SSM0.5.2dev.hpp
 Linking C++ model 
g++ std=c++1y pthread Wnosigncompare O3 I src I stan/src I stan/lib/stan_math/ I stan/lib/stan_math/lib/eigen_3.3.3 I stan/lib/stan_math/lib/boost_1.69.0 I stan/lib/stan_math/lib/sundials_4.1.0/include DBOOST_RESULT_OF_USE_TR1 DBOOST_NO_DECLTYPE DBOOST_DISABLE_ASSERTS DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION include ../SSM0.5.2dev.hpp src/cmdstan/main.cpp stan/lib/stan_math/lib/sundials_4.1.0/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_4.1.0/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_4.1.0/lib/libsundials_idas.a o ../SSM0.5.2dev
In file included from stan/lib/stan_math/stan/math/rev/mat/fun/ordered_constrain.hpp:6:0,
from stan/lib/stan_math/stan/math/rev/mat.hpp:41,
from stan/lib/stan_math/stan/math.hpp:4,
from stan/src/stan/model/model_header.hpp:4,
from ./../SSM0.5.2dev.hpp:3,
from <commandline>:0:
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:31:32: error: ‘std::index_sequence’ has not been declared
std::index_sequence<I...> i) {
^
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:31:46: error: expected ‘,’ or ‘...’ before ‘<’ token
std::index_sequence<I...> i) {
^
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp: In function ‘constexpr auto stan::math::internal::apply(const F&, const Tuple&)’:
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:49:27: error: ‘make_index_sequence’ is not a member of ‘std’
return apply_impl(f, t, std::make_index_sequence<std::tuple_size<Tuple>{}>{});
^
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:49:27: note: suggested alternative:
In file included from stan/lib/stan_math/lib/boost_1.69.0/boost/fusion/container/vector/vector.hpp:28:0,
from stan/lib/stan_math/lib/boost_1.69.0/boost/fusion/container/vector/vector10.hpp:25,
from stan/lib/stan_math/lib/boost_1.69.0/boost/fusion/view/transform_view/transform_view.hpp:22,
from stan/lib/stan_math/lib/boost_1.69.0/boost/fusion/algorithm/transformation/transform.hpp:11,
from stan/lib/stan_math/lib/boost_1.69.0/boost/fusion/view/zip_view/detail/begin_impl.hpp:14,
from stan/lib/stan_math/lib/boost_1.69.0/boost/fusion/view/zip_view/zip_view.hpp:16,
from stan/lib/stan_math/lib/boost_1.69.0/boost/fusion/view/zip_view.hpp:12,
from stan/lib/stan_math/lib/boost_1.69.0/boost/fusion/include/zip_view.hpp:11,
from stan/lib/stan_math/lib/boost_1.69.0/boost/numeric/odeint/util/resize.hpp:26,
from stan/lib/stan_math/lib/boost_1.69.0/boost/numeric/odeint/util/state_wrapper.hpp:26,
from stan/lib/stan_math/lib/boost_1.69.0/boost/numeric/odeint/util/ublas_wrapper.hpp:33,
from stan/lib/stan_math/lib/boost_1.69.0/boost/numeric/odeint.hpp:25,
from stan/lib/stan_math/stan/math/prim/arr/functor/integrate_ode_rk45.hpp:17,
from stan/lib/stan_math/stan/math/prim/arr.hpp:46,
from stan/lib/stan_math/stan/math/prim/mat.hpp:344,
from stan/lib/stan_math/stan/math/rev/mat.hpp:12,
from stan/lib/stan_math/stan/math.hpp:4,
from stan/src/stan/model/model_header.hpp:4,
from ./../SSM0.5.2dev.hpp:3,
from <commandline>:0:
stan/lib/stan_math/lib/boost_1.69.0/boost/fusion/support/detail/index_sequence.hpp:59:12: note: ‘boost::fusion::detail::make_index_sequence’
struct make_index_sequence
^
In file included from stan/lib/stan_math/stan/math/rev/mat/fun/ordered_constrain.hpp:6:0,
from stan/lib/stan_math/stan/math/rev/mat.hpp:41,
from stan/lib/stan_math/stan/math.hpp:4,
from stan/src/stan/model/model_header.hpp:4,
from ./../SSM0.5.2dev.hpp:3,
from <commandline>:0:
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:49:77: error: expected primaryexpression before ‘{’ token
return apply_impl(f, t, std::make_index_sequence<std::tuple_size<Tuple>{}>{});
^
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:49:77: error: expected ‘)’ before ‘{’ token
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp: At global scope:
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:151:9: error: expected typespecifier
= std::result_of_t<F(decltype(is_var_), decltype(value_of(Targs()))...)>;
^
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:156:42: error: ‘FReturnType’ was not declared in this scope
std::array<int, internal::compute_dims<FReturnType>::value> M_;
^
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:156:53: error: template argument 1 is invalid
std::array<int, internal::compute_dims<FReturnType>::value> M_;
^
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:156:61: error: template argument 2 is invalid
std::array<int, internal::compute_dims<FReturnType>::value> M_;
^
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp: In member function ‘std::vector<stan::math::var> stan::math::adj_jac_vari<F, Targs>::build_return_varis_and_vars(const std::vector<double>&)’:
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:349:9: error: invalid types ‘int[int]’ for array subscript
M_[0] = val_y.size();
^
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:350:32: error: invalid types ‘int[int]’ for array subscript
std::vector<var> var_y(M_[0]);
^
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp: In member function ‘Eigen::Matrix<stan::math::var, R, C> stan::math::adj_jac_vari<F, Targs>::build_return_varis_and_vars(const Eigen::Matrix<double, R, C>&)’:
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:375:9: error: invalid types ‘int[int]’ for array subscript
M_[0] = val_y.rows();
^
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:376:9: error: invalid types ‘int[int]’ for array subscript
M_[1] = val_y.cols();
^
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:377:40: error: invalid types ‘int[int]’ for array subscript
Eigen::Matrix<var, R, C> var_y(M_[0], M_[1]);
^
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:377:47: error: invalid types ‘int[int]’ for array subscript
Eigen::Matrix<var, R, C> var_y(M_[0], M_[1]);
^
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp: In member function ‘void stan::math::adj_jac_vari<F, Targs>::chain( ’:
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:530:5: error: ‘FReturnType’ was not declared in this scope
FReturnType y_adj;
^
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:530:17: error: expected ‘;’ before ‘y_adj’
FReturnType y_adj;
^
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:532:38: error: ‘y_adj’ was not declared in this scope
internal::build_y_adj(y_vi_, M_, y_adj);
^
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:536:26: error: expansion pattern ‘auto&&’ contains no argument packs
[this](auto&&... args) { this>accumulate_adjoints(args...); },
^
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp: In lambda function:
stan/lib/stan_math/stan/math/rev/mat/functor/adj_jac_apply.hpp:536:60: error: ‘args’ was not declared in this scope
[this](auto&&... args) { this>accumulate_adjoints(args...); },
^
make: *** [../SSM0.5.2dev] Error 1
Also, I followed p.28 of the cmdstanguide2.19.1.pdf to hope to sample 4 chains in parallel but the chains ran sequentially even when multiple cores are specified.
Any suggestion or pointer to the correct commands for a cluster environment?
#PBS l nodes=1:ppn=8
for i in {1..4}
do
../SSM0.5.2dev sample algorithm=hmc metric=dense_e adapt delta=0.95 \
id=$i data file=../SSM_data.dat output file=samples$i.csv
done
Did you get things to compile? I’m guessing this is the compiler. I think Stan supports gcc 4.9.3 (whatever Rtools is at). std::index_sequence is a c++14 thing (https://en.cppreference.com/w/cpp/utility/integer_sequence)
To do four chains in parallel, you gotta run four separate things. If I remember correct qsub scripts just launch the same script on however many different nodes you request, so maybe just do something like:
#PBS
../SSM0.5.2dev sample algorithm=hmc metric=dense_e adapt delta=0.95 \
id=$PBS_ARRAYID data file=../SSM_data.dat output file=samples$PBS_ARRAYID.csv
And then however many nodes you request via qsub (or cores or however it’s denoted), you’ll get that many outputs. Use the examples/bernoulli job to test this.
Hi, Ben:
I finally completed a run of 4 chains using metric=dense_e
(to compare with that using metric=diag_e
). It does reduce the run time significantly (with essentially the same estimates given the large amount of simulated data used: 12,000 data points).
Here are the summary output: dense_e_cmdStan0.5.2dev_2.6hrs.txt (5.2 KB) diag_e_cmdStan0.5.2dev_9.6hrs.txt (5.2 KB)
Would appreciate very much your feedback on whether there’s anything that I should pay attention to, especially about the diagnostic indicators. Thanks.
Well the things to watch out for are the usual things: are there divergences in the dense_e version and are the Rhats 1?
Your N_effs still seem really high. You might be able to get away with cutting your chain lengths in half or something. Maybe try 500 warmup/500 samples and see what happens? I wouldn’t go much less than 500 though. Each chain needs to be long enough to do warmup and get enough N_eff individually to make the Rhats and such reliable.
Would appreciate your clarification of the following issues that seem confusing.
In the documentation (eg, p.29 of cmdstanguide2.19.1.pdf), the reported divergent__
row has the NEff
equal to the 1000
iternations saved (but in my case below, I have nan
) and also Rhat = nan
(my case: 9.99e01
). Are these NEff
and Rhat
, in either case, meaningful at all?
The Mean
in my case is nonzero but very small (5.00e03
). Is this merely due to trying to report an integer value in floating point? (In other words, that reported value should be taken as 0
.)
Inference for Stan model: SSM0_5_2dev_model
4 chains: each with iter=(400,400,400,400); warmup=(0,0,0,0); thin=(1,1,1,1); 1600 iterations saved.
Warmup took (7133, 7125, 7429, 6727) seconds, 7.89 hours total
Sampling took (3443, 3584, 3725, 3661) seconds, 4.00 hours total
Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat
lp__ 2.07e+04 5.76e01 1.39e+01 2.07e+04 2.07e+04 2.07e+04 5.82e+02 4.04e02 1.00e+00
accept_stat__ 9.62e01 2.75e03 8.83e02 8.48e01 9.88e01 9.99e01 1.03e+03 7.16e02 1.01e+00
stepsize__ 1.79e02 6.50e04 9.20e04 1.69e02 1.79e02 1.94e02 2.00e+00 1.39e04 5.60e+13
treedepth__ 9.00e+00 5.04e03 1.98e01 9.00e+00 9.00e+00 9.00e+00 1.55e+03 1.08e01 1.00e+00
n_leapfrog__ 5.46e+02 3.59e+00 1.34e+02 5.11e+02 5.11e+02 1.02e+03 1.39e+03 9.63e02 1.02e+00
divergent__ 5.00e03 nan 7.06e02 0.00e+00 0.00e+00 0.00e+00 nan nan 9.99e01
energy__ 2.05e+04 8.28e01 1.91e+01 2.06e+04 2.05e+04 2.05e+04 5.30e+02 3.68e02 1.01e+00
sd_y 8.01e02 1.35e05 5.74e04 7.91e02 8.01e02 8.10e02 1.80e+03 1.25e01 9.98e01
mu_u1 1.02e01 1.99e04 8.99e03 8.65e02 1.02e01 1.16e01 2.03e+03 1.41e01 1.00e+00
mu_alpha 4.17e02 4.61e05 1.86e03 3.87e02 4.18e02 4.47e02 1.63e+03 1.13e01 9.99e01
beta 5.93e01 1.87e04 7.52e03 5.80e01 5.93e01 6.05e01 1.62e+03 1.12e01 1.00e+00
theta 1.53e01 7.81e05 3.24e03 1.48e01 1.53e01 1.59e01 1.71e+03 1.19e01 1.00e+00
sd_season 1.02e01 1.03e04 4.25e03 9.46e02 1.01e01 1.09e01 1.70e+03 1.18e01 1.00e+00
mu_season[1] 1.26e01 2.23e04 1.05e02 1.44e01 1.26e01 1.09e01 2.20e+03 1.53e01 1.00e+00
mu_season[2] 7.06e02 2.23e04 1.02e02 8.78e02 7.04e02 5.45e02 2.07e+03 1.44e01 1.00e+00
mu_season[3] 1.46e01 2.42e04 1.03e02 1.28e01 1.46e01 1.63e01 1.82e+03 1.27e01 1.00e+00
p[1] 6.45e01 1.07e03 3.83e02 5.98e01 6.38e01 7.19e01 1.27e+03 8.81e02 1.00e+00
p[2] 6.27e01 1.20e04 5.35e03 6.18e01 6.27e01 6.36e01 1.98e+03 1.37e01 9.99e01
p[3] 5.97e01 1.54e03 5.48e02 5.30e01 5.87e01 7.04e01 1.27e+03 8.79e02 1.00e+00
g[1] 8.36e01 1.09e03 4.36e02 7.65e01 8.36e01 9.10e01 1.62e+03 1.12e01 1.00e+00
g[2] 3.37e01 5.20e04 2.02e02 3.05e01 3.37e01 3.71e01 1.51e+03 1.05e01 1.00e+00
w[1] 7.33e01 1.53e03 6.31e02 6.24e01 7.34e01 8.35e01 1.70e+03 1.18e01 9.99e01
w[2] 1.41e01 3.11e04 1.26e02 1.20e01 1.41e01 1.62e01 1.66e+03 1.15e01 9.99e01
w[3] 5.94e01 4.87e04 1.95e02 5.61e01 5.94e01 6.25e01 1.60e+03 1.11e01 9.99e01
d[1] 6.68e02 2.74e04 1.07e02 4.91e02 6.69e02 8.42e02 1.54e+03 1.07e01 1.00e+00
d[2] 6.98e01 2.17e04 9.15e03 6.83e01 6.98e01 7.13e01 1.78e+03 1.23e01 9.99e01
d[3] 2.35e01 2.62e04 1.10e02 2.17e01 2.35e01 2.53e01 1.78e+03 1.23e01 1.00e+00
Don’t worry about Rhat for the divergent__ parameter, that’s for the parameters mostly.
5e3 is the fraction of postwarmup trajectories that were divergent in all chains, so there were 5e3 * 1600 = 8 divergences across all four chains.
This model looks like it took way longer than the previous one to run? Nearly 12 hours four 1600 draws vs. 7 hours for 4000 draws? And the treedepth is way larger and stepsize way smaller. Was this with more data or something?
Was the adapt_delta higher in this run? I guess if it has to be higher to keep divergences down then there hasn’t been any real advantage to switching to the dense adaptation.
This is the clearest document on what divergences can mean: https://mcstan.org/users/documentation/casestudies/divergences_and_bias.html . In that case the divergences are happening in 2% of trajectories and they do not go away with an adapt_delta closer to 1.
The warning divergences give is that you might not be sampling the distribution you think you are. I guess since you have results from the diagonal adaptation which isn’t producing divergences, you could make predictions with each of them with example data and see if there are differences.
If you can check your results against the model run with diagonal adaptation. Rhat occasionally ends up less than 1.0. I don’t think what you’re seeing in your results here it’s a thing to worry about.
I have been assuming that the 1.01
threshold for Rhat
you mentioned before is equivalent to a 0.99
threshold. So only the deviation from 1 matters, right?
Thanks for the clarification and additional information. I need to look into the issues involved more carefully and systematically. Let me simply reply to your question about the slow run time: I suspect that my changing the p[3]
's true parameter value resulted in divergences, which typically would lead to a much slower run time.
The p[3]
of my model has been hard to estimate reliably – usually overestimating the true value (0.35) to 0.43 (but other than that, the run time then seemed all right and I didn’t have divergences). I wanted to understand to what extent the hardtoestimate p[3]
is driven by my prior (with a mean at 0.5). So in the recent runs, I set the true value of p[3]
to 0.68 (above 0.5) to see whether the estimate would fall somehwere in between, which I believe would suggest the p[3]
cannot be well identified by the data and thus the estimate is always pulled toward the prior’s mean (0.5). p[1:3]
are the parameters of a sigmoid function modeled with inv_logit()
. I guess the nonlinearity might make it hard to be accurately estimated.
I think Rhat is supposed to be greater than or equal to 1.0. I think it being less than one just happens randomly, but isn’t a big deal. So less than or greater than one are interpreted differently.
Only suggestion I have is maybe generate a dataset that doesn’t take 12 hours to run haha. It’s gonna be hard to make much headway with calculations that slow.
Thanks for your clarification and suggestion. Make a lot of sense. Owing to some other considerations, for now I still use 10,000 simulated data points.
I am returning to more conservative choices of warmup
(1000) and delta
(0.99) to first ensure no divergence. Just finished a comparison of the dense
vs diag
runs on the slower cluster I have access to. I believe the results confirm that it’s fine (ie, similar results) and benefitial (ie, shorter run time) to use the dense
metric.
Would appreciate your clarification of one thing though. In the diag
results below, the rows from stepsize
to n_leapfrog
have MCSE
and N_Eff
reported as nan
, which is different from those reported in the dense
results. Is this difference something to worry about by me as a model builder?
diag
results:
4 chains: each with iter=(350,350,350,350); warmup=(0,0,0,0); thin=(1,1,1,1); 1400 iterations saved.
Warmup took (30658, 31333, 31104, 31588) seconds, 34.6 hours total
Sampling took (11519, 11515, 11542, 11483) seconds, 12.8 hours total
Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat
lp__ 2.07e+04 6.17e01 1.41e+01 2.07e+04 2.07e+04 2.07e+04 5.22e+02 1.13e02 1.01e+00
accept_stat__ 9.92e01 1.43e03 1.31e02 9.66e01 9.98e01 1.00e+00 8.43e+01 1.83e03 1.03e+00
stepsize__ 2.05e03 nan 5.46e04 1.41e03 2.11e03 2.90e03 nan nan 4.32e+14
treedepth__ 1.00e+01 nan 3.02e14 1.00e+01 1.00e+01 1.00e+01 nan nan 9.97e01
n_leapfrog__ 1.02e+03 nan 8.98e12 1.02e+03 1.02e+03 1.02e+03 nan nan 9.97e01
divergent__ 0.00e+00 nan 0.00e+00 0.00e+00 0.00e+00 0.00e+00 nan nan nan
energy__ 2.05e+04 8.54e01 1.95e+01 2.06e+04 2.05e+04 2.05e+04 5.20e+02 1.13e02 1.01e+00
sd_y 8.01e02 2.65e05 5.77e04 7.92e02 8.01e02 8.10e02 4.74e+02 1.03e02 1.00e+00
mu_u1 1.01e01 5.45e04 8.94e03 8.58e02 1.01e01 1.15e01 2.69e+02 5.85e03 1.01e+00
mu_alpha 4.16e02 1.18e04 1.85e03 3.87e02 4.16e02 4.47e02 2.47e+02 5.36e03 1.01e+00
beta 5.93e01 7.16e04 7.70e03 5.80e01 5.93e01 6.06e01 1.16e+02 2.51e03 1.03e+00
theta 1.53e01 3.12e04 3.36e03 1.48e01 1.53e01 1.58e01 1.16e+02 2.53e03 1.03e+00
sd_season 1.02e01 2.36e04 4.21e03 9.44e02 1.02e01 1.09e01 3.19e+02 6.92e03 1.02e+00
mu_season[1] 1.27e01 3.80e04 9.74e03 1.43e01 1.27e01 1.10e01 6.56e+02 1.42e02 1.00e+00
mu_season[2] 6.99e02 5.43e04 1.01e02 8.60e02 6.97e02 5.31e02 3.47e+02 7.54e03 1.01e+00
mu_season[3] 1.45e01 5.61e04 1.00e02 1.29e01 1.45e01 1.62e01 3.20e+02 6.95e03 1.00e+00
p[1] 6.43e01 3.36e03 3.69e02 5.98e01 6.36e01 7.09e01 1.21e+02 2.62e03 1.03e+00
p[2] 6.27e01 2.00e04 5.64e03 6.18e01 6.27e01 6.36e01 7.94e+02 1.72e02 1.00e+00
p[3] 5.95e01 4.78e03 5.26e02 5.31e01 5.85e01 6.89e01 1.21e+02 2.64e03 1.03e+00
g[1] 8.39e01 2.58e03 4.22e02 7.72e01 8.39e01 9.09e01 2.68e+02 5.82e03 1.02e+00
g[2] 3.37e01 1.26e03 1.95e02 3.04e01 3.37e01 3.68e01 2.39e+02 5.18e03 1.02e+00
w[1] 7.30e01 4.82e03 6.03e02 6.25e01 7.31e01 8.23e01 1.57e+02 3.40e03 1.04e+00
w[2] 1.41e01 7.64e04 1.24e02 1.22e01 1.41e01 1.62e01 2.62e+02 5.69e03 1.02e+00
w[3] 5.92e01 9.68e04 1.88e02 5.61e01 5.92e01 6.22e01 3.77e+02 8.18e03 1.00e+00
d[1] 6.71e02 6.41e04 1.04e02 4.97e02 6.72e02 8.40e02 2.65e+02 5.74e03 1.00e+00
d[2] 6.98e01 4.35e04 9.29e03 6.82e01 6.98e01 7.14e01 4.57e+02 9.91e03 1.00e+00
d[3] 2.35e01 7.15e04 1.10e02 2.16e01 2.36e01 2.53e01 2.38e+02 5.18e03 1.01e+00
dense
results:
4 chains: each with iter=(350,350,350,350); warmup=(0,0,0,0); thin=(1,1,1,1); 1400 iterations saved.
Warmup took (21000, 20885, 20857, 22902) seconds, 23.8 hours total
Sampling took (7961, 4248, 7934, 8000) seconds, 7.82 hours total
Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat
lp__ 2.07e+04 6.02e01 1.34e+01 2.07e+04 2.07e+04 2.07e+04 4.94e+02 1.75e02 1.01e+00
accept_stat__ 9.90e01 5.44e04 2.08e02 9.57e01 9.96e01 1.00e+00 1.47e+03 5.21e02 1.01e+00
stepsize__ 1.05e02 1.52e03 2.15e03 8.80e03 9.84e03 1.42e02 2.00e+00 7.12e05 5.08e+14
treedepth__ 9.73e+00 2.98e01 4.46e01 9.00e+00 1.00e+01 1.00e+01 2.23e+00 7.92e05 3.00e+00
n_leapfrog__ 8.98e+02 1.46e+02 2.20e+02 5.11e+02 1.02e+03 1.02e+03 2.27e+00 8.07e05 2.79e+00
divergent__ 0.00e+00 nan 0.00e+00 0.00e+00 0.00e+00 0.00e+00 nan nan nan
energy__ 2.05e+04 8.71e01 1.91e+01 2.06e+04 2.05e+04 2.05e+04 4.81e+02 1.71e02 1.01e+00
sd_y 8.01e02 1.41e05 5.67e04 7.91e02 8.01e02 8.10e02 1.61e+03 5.73e02 9.99e01
mu_u1 1.02e01 2.08e04 8.64e03 8.75e02 1.02e01 1.16e01 1.73e+03 6.13e02 9.99e01
mu_alpha 4.17e02 4.73e05 1.87e03 3.86e02 4.16e02 4.48e02 1.57e+03 5.57e02 1.00e+00
beta 5.92e01 1.79e04 7.24e03 5.80e01 5.92e01 6.04e01 1.63e+03 5.79e02 9.98e01
theta 1.53e01 7.53e05 3.14e03 1.48e01 1.53e01 1.59e01 1.73e+03 6.15e02 9.99e01
sd_season 1.01e01 9.44e05 4.18e03 9.49e02 1.01e01 1.09e01 1.96e+03 6.97e02 9.99e01
mu_season[1] 1.26e01 2.23e04 1.01e02 1.43e01 1.26e01 1.09e01 2.05e+03 7.30e02 9.99e01
mu_season[2] 7.02e02 2.50e04 1.02e02 8.74e02 7.03e02 5.30e02 1.66e+03 5.90e02 9.99e01
mu_season[3] 1.46e01 2.48e04 1.01e02 1.29e01 1.46e01 1.63e01 1.66e+03 5.92e02 1.00e+00
p[1] 6.45e01 1.11e03 3.86e02 5.98e01 6.38e01 7.19e01 1.21e+03 4.31e02 9.99e01
p[2] 6.27e01 1.23e04 5.46e03 6.18e01 6.27e01 6.36e01 1.97e+03 7.01e02 1.00e+00
p[3] 5.97e01 1.58e03 5.52e02 5.30e01 5.88e01 7.05e01 1.22e+03 4.32e02 9.98e01
g[1] 8.36e01 1.12e03 4.61e02 7.60e01 8.36e01 9.10e01 1.71e+03 6.08e02 9.99e01
g[2] 3.38e01 5.04e04 1.99e02 3.05e01 3.37e01 3.71e01 1.56e+03 5.54e02 1.00e+00
w[1] 7.34e01 1.63e03 6.55e02 6.27e01 7.36e01 8.37e01 1.61e+03 5.72e02 1.00e+00
w[2] 1.41e01 2.97e04 1.31e02 1.20e01 1.41e01 1.62e01 1.94e+03 6.89e02 1.00e+00
w[3] 5.94e01 4.72e04 1.98e02 5.63e01 5.93e01 6.28e01 1.76e+03 6.26e02 9.98e01
d[1] 6.73e02 2.60e04 1.03e02 5.01e02 6.75e02 8.37e02 1.57e+03 5.57e02 1.00e+00
d[2] 6.98e01 2.36e04 9.39e03 6.83e01 6.98e01 7.14e01 1.58e+03 5.61e02 1.00e+00
d[3] 2.35e01 2.69e04 1.09e02 2.17e01 2.35e01 2.53e01 1.64e+03 5.82e02 1.00e+00