CmdStan Guide now online

mitzimorris · July 3, 2020, 6:24am

I’ve added the CmdStan User’s Guide to the Stan documentation set in the https://github.com/stan-dev/docs repo, and published the ~~2.23 version~~ new version here:

https://mc-stan.org/docs/cmdstan-guide/

also pdf: ~~https://mc-stan.org/docs/2_23/cmdstan-guide-2_23.pdf~~ new version https://mc-stan.org/docs/2_23/cmdstan-guide-2_24.pdf

The issue for this is: https://github.com/stan-dev/docs/issues/203
Your feedback welcome!!!
Suggestions and comments here or via the docs repo issues

torkar · July 3, 2020, 7:02am

$ git clone https://github.com/stan-dev/cmdstan.git
Cloning into 'cmdstan'...
remote: Enumerating objects: 25, done.
remote: Counting objects: 100% (25/25), done.
remote: Compressing objects: 100% (20/20), done.
remote: Total 8284 (delta 8), reused 18 (delta 5), pack-reused 8259
Receiving objects: 100% (8284/8284), 135.38 MiB | 3.86 MiB/s, done.
Resolving deltas: 100% (4545/4545), done.
$ cd cmdstan/
$ make -j16 build
ERROR: Missing Stan submodules.
Please run the following commands to fix:

git submodule init
git submodule update --recursive

And try building again
makefile:192: recipe for target 'build' failed
make: *** [build] Error 1
$ git submodule init
Submodule 'stan' (https://github.com/stan-dev/stan) registered for path 'stan'
$ git submodule update --recursive
Cloning into '/home/rstudio/Development/cmdstan/stan'...
Submodule path 'stan': checked out '669dea8c3bb4aa20ee6ac8eae1ce6397e747d03c'
$ make -j16 build
ERROR: Missing Stan submodules.
Please run the following commands to fix:

git submodule init
git submodule update --recursive

And try building again
makefile:192: recipe for target 'build' failed
make: *** [build] Error 1

EDIT: I tried make clean. This is on Ubuntu 18.04.

$ make --version
GNU Make 4.1
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ git --version
git version 2.17.1

Are the instructions wrong, or is there something wrong with my installation?

rok_cesnovar · July 3, 2020, 7:29am

Yes, the instructions should be:

> git clone https://github.com/stan-dev/cmdstan.git --recursive
> cd cmdstan
> make build

Posted just now: https://github.com/stan-dev/docs/issues/203#issuecomment-653397630

torkar · July 3, 2020, 8:43am

Hvala Rok,

Richard

mitzimorris · July 3, 2020, 11:29pm

Looking for feedback on the chapter on the sampler configuration arguments: https://mc-stan.org/docs/2_23/cmdstan-guide/mcmc-config.html

is the information in the sections on sampler config for adaptaion and nuts, sections: https://mc-stan.org/docs/2_23/cmdstan-guide/mcmc-config.html#adaptation and https://mc-stan.org/docs/2_23/cmdstan-guide/mcmc-config.html#algorithm, respectively clear and correct?
are the examples https://mc-stan.org/docs/2_23/cmdstan-guide/mcmc-config.html#examples useful?
suggestions for more examples where folks often find themselves wrestling with the command line argument parser?

looking for input from power users like @wds15, @bbbales2, @mike-lawrence as well as folks who’ve tried to use CmdStan and run into problems - (I know @lauren has mentioned this to me - I’m sure she’s not alone).

bbbales2 · July 4, 2020, 5:46pm

I read through the mcmc-config stuff. It seems okay to me.

In the list with the metric options, there’s an unpaired tick after ‘dense_e’: “metric=dense_e`”
“By default, all parameters are initialized to random draws from a uniform distribution” → “By default, all parameters are initialized on an unconstrained scale to random draws from a uniform distribution”. We’re not saying what the unconstrained scale is but it might be confusing to not say there is one.

suggestions for more examples where folks often find themselves wrestling with the command line argument parser?

I think the manual has enough examples, so Discourse is probably the place.

mike-lawrence · July 5, 2020, 11:13pm

Just finished a read-through, here are some notes:

1.3: stan-update is mentioned once but never again; possibly an example would help?
3.2: typo: “willis”
4.4: what’s going on with the ridiculously high Rhat for stepsize__?
7: generated quantities: is this section incomplete? For example, it’s unclear if the bernoulli_yrep.stan should literally only contain a generated quantities section, or should it have all the content from bernoulli.stan as well?
9.1: “An EFF of at least 100 is required to make a viable estimate. The precision of your estimate is √N; therefore every additional decimal place of accuracy increases this by a factor of 10.” Ah, this is something I hadn’t come across before (I only had a vague related intuition). Very useful.
9.1 On thinning: should it say something like “Some users familiar with older approaches to MCMC sampling might be used to thinning to eliminate an expected autocorrelation in the samples. HMC is not nearly as susceptible to this autocorrelation problem and thus thinning is generally not required nor advised except in circumstances where storage of the samples is limited and/or RAM for later processing the samples is limited.” ?
9.2.1 “Raising the value of delta will also allow some models that would otherwise get stuck to overcome their blockages” & also this from 16.3.1: “If the divergent transitions cannot be eliminated by increasing the adapt_delta parameter, we have to find a different way to write the model that is logically equivalent but simplifies the geometry of the posterior distribution.” --> This seems to imply that the first step in troubleshooting divergences is to change adapt_delta, but I thought it was becoming more strongly recommended that folks not do this, and instead first look to changing the model geometry, which could yield both more efficient and faster sampling.
9.5.1: what happens if you specify the random seed and id as ${i} ? Would this thwart the in-built mechanisms for avoiding overlap in seeds?

mitzimorris · July 6, 2020, 5:40am

good question. I don’t think the Rhat statistic makes sense for stepsize, but I’m curious as to why it has the value that it does. pinging @avehtari - what is the correct thing to report? don’t analyze sample state columns in Stan csv output file?

yes. working on it now.

excellent suggestion - will add - many thanks!

I could move or copy the discussion from 9.5.6 (Redirecting…) to 9.2.1.
note that not long ago, this was disussed elsewhere - Improve warnings for low ESS - at which point Bob said:

short answer - no - each chain will be running off an RNG which has its own seed.
the samplers use both the seed and the id together to create the RNG used by the sampler - https://github.com/stan-dev/stan/blob/develop/src/stan/services/util/create_rng.hpp

many thanks for the close read and good suggestions!

rok_cesnovar · July 6, 2020, 5:58am

Great stuff @mike-lawrence

Slightly related question:

Could we fix that https://mc-stan.org/docs/ works and is not a 404 error. Either redirect to the latest docs or a link to a landing page with just links of all the docs (2.23, 2.22, …)

avehtari · July 6, 2020, 7:20am

Correct, there is no sense computing or reporting Rhat or ESS for sampler diagnostic columns.

Also it would be useful give an example value to try (e.g. 0.9 or 0.95), that logically it has to be <1, and it’s rarely sensible to try adapt_delta>0.99 ) as it is already strong indication of bad geometry and the sampling tends to take long time with larger adapt_delta values.

mike-lawrence · July 6, 2020, 12:24pm

Ok, but just double-checking: if one were to non-sensically attempt to compute an Rhat on the stepsize__, a value as extreme as 2.5e+13 is a reasonable expectation and not a signal that something is concerningly awry in the Rhat computation generally?

avehtari · July 6, 2020, 12:45pm

Can you provide the sequence and the exact Rhat code you call? The latest implementation in rstan monitor.R and in posterior package should not provide anything like that unless you have similar order of magnitude of iterations.

mike-lawrence · July 6, 2020, 12:48pm

This was observed in @mitzimorris’s new cmdstan guide, section 4.4, where stansummary is used on a simple binomial model.

avehtari · July 6, 2020, 12:53pm

The link you provide for for “section 4.4” is https://discourse.mc-stan.org/t/brms-constrain-group-level-effects-to-positive-values/16160

mike-lawrence · July 6, 2020, 12:55pm

Oops! I wonder how that happened; I didn’t even look at that thread (since I don’t know brms at all). Edited the above to have the correct url.

rok_cesnovar · July 6, 2020, 1:05pm

This is a guess, but I think the difference is that the rhat C++ code in https://github.com/stan-dev/stan/blob/d8c34d315f92892a9d19b96e06b196bd7640b7e5/src/stan/analyze/mcmc/compute_potential_scale_reduction.hpp should be updated to what posterior is using, which is based on this paper https://arxiv.org/abs/1903.08008 right?

This C++ is what cmdstan uses if you use stansummary. (cmdstanr uses posteriors implementation if you dont specifically want the cmdstan one).

avehtari · July 6, 2020, 1:40pm

I just realized that for Rhat, no matter which version, this doesn’t make sense. Step size should be constant after warm-up. Thus all within variances should be zero. But different chains may have different constant step size, so that between variance should be positive. So there is some floating point accuracy mishap in the current C++ so that var_between / var_within ends up being really large number instead of Inf. The correct results would be Inf, but then it doesn’t make sense to report Rhat and ESS for
accept_stat__, stepsize__, treedepth__, n_leapfrog__, and divergent__

avehtari · July 6, 2020, 2:06pm

So it seems that boost::accumulators::variance(acc_draw) is not returning 0 variance for the constant step size for at least one of the chains (). It seems boost::accumulators::variance uses equation M_n^{(2)} - \mu_n^2, which may get numerical floating point error.

mitzimorris · July 7, 2020, 12:55am

just filed an issue to omit sampler diagnostic columns from CmdStan stansummary: https://github.com/stan-dev/cmdstan/issues/903

question - report Rhat and ESS for lp__ ?

kedartal · July 9, 2020, 4:39pm

@mitzimorris In section 9.2.1 (“Step size optimization configuration”), should the second gamma be kappa?

Topic		Replies	Views
Cmdstan 2.24.1 is released Announcements cmdstan	16	1538	August 22, 2020
Cmdstan 2.24 release candidate now available General	58	3364	August 21, 2020
Development branch CmdStan installation error CmdStan	4	1661	December 17, 2018
Repo for Stan documentation - now online! Developers	21	1781	December 19, 2018
CmdStan 2.27.0 is now available Announcements	3	622	June 8, 2021

CmdStan Guide now online

Related topics