Question on Stan's pathfinder method in cmdstan (v2.34.0)

Running below pathfinder method in the ./examples/bernoulli directory (after, in the cmdstan installation directory, running make examples/bernoulli/bernoulli), all output is written to a single .log file. That .log file can in various ways get mingled. E.g. in below example with num_paths=7 and num_threads=8. Path [1] is written at the end of Path [6] Best Iter and Path [3] is appended to Path [1] on the next line. With any num_threads > 1 this can/will happen sooner or later. I would like to create a DataFrame from the bottom part of the logfile(s).

Creating separate profile.log files with the output profile_file=profile_log_1.log cmdline argument doesn’t work.

./bernoulli pathfinder num_paths=7 data file=bernoulli.data.json output save_cmdstan_config=1 profile_file=bernoulli.profile.1.log num_threads=8
method = pathfinder
  pathfinder
    init_alpha = 0.001000 (Default)
    tol_obj = 0.000000 (Default)
    tol_rel_obj = 10000.000000 (Default)
    tol_grad = 0.000000 (Default)
    tol_rel_grad = 10000000.000000 (Default)
    tol_param = 0.000000 (Default)
    history_size = 5 (Default)
    num_psis_draws = 1000 (Default)
    num_paths = 7
    save_single_paths = 0 (Default)
    psis_resample = 1 (Default)
    calculate_lp = 1 (Default)
    max_lbfgs_iters = 1000 (Default)
    num_draws = 1000 (Default)
    num_elbo_draws = 25 (Default)
id = 1 (Default)
data
  file = bernoulli.data.json
init = 2 (Default)
random
  seed = 966503968 (Default)
output
  file = output.csv (Default)
  diagnostic_file =  (Default)
  refresh = 100 (Default)
  sig_figs = -1 (Default)
  profile_file = bernoulli.profile_.log
  save_cmdstan_config = 1
num_threads = 8

Path [1] :Initial log joint density = -18.295156
Path [4] :Initial log joint density = -6.842967
Path [6] :Initial log joint density = -7.426503
Path [5] :Initial log joint density = -7.053005
Path [2] :Initial log joint density = -16.022127
Path [7] :Initial log joint density = -8.070977
Path [3] :Initial log joint density = -9.786336
Path [5] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
              4      -6.748e+00      2.727e-03   5.178e-05    1.000e+00  1.000e+00       101 -6.200e+00 -6.200e+00
Path [2] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
              5      -6.748e+00      1.861e-03   5.584e-05    1.000e+00  1.000e+00       126 -6.214e+00 -6.214e+00
Path [3] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
              5      -6.748e+00      5.647e-04   6.305e-06    1.000e+00  1.000e+00       126 -6.284e+00 -6.284e+00
Path [6] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
              5      -6.748e+00      9.555e-05   3.509e-07    1.000e+00  1.000e+00       126 -6.208e+00 -6.208e+00
Path [6] :Best Iter: [5] ELBO (-6.208032) evaluations: (126)Path [1] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
              5      -6.748e+00      6.948e-04   1.393e-05    1.000e+00  1.000e+00       126 -6.212e+00 -6.212e+00
Path [1] :Best Iter: [4] ELBO (-6.173815) evaluations: (126)Path [3] :Best Iter: [2] ELBO (-6.155017) evaluations: (126)
Path [7] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
              5      -6.748e+00      2.395e-04   1.559e-06    1.000e+00  1.000e+00       126 -6.201e+00 -6.201e+00
Path [5] :Best Iter: [4] ELBO (-6.199598) evaluations: (101)Path [7] :Best Iter: [2] ELBO (-6.107752) evaluations: (126)

Path [4] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
              4      -6.748e+00      7.898e-05   9.720e-06    9.497e-01  9.497e-01       101 -6.205e+00 -6.205e+00
Path [4] :Best Iter: [4] ELBO (-6.205340) evaluations: (101)
Path [2] :Best Iter: [3] ELBO (-6.135622) evaluations: (126)


Total log probability function evaluations:7657

This is in Github CI and MacOS, cmdstan 2.34.0, Julia v1.10 but also the .zsh command line.

I think the argument you’re looking for is save_single_paths

you want information about the PSIS resampling step?

Thanks for helping (again!) Mitzi.

Yes, I’m trying to construct a DataFrame like this"

	initial_density	best_iter	elbo	evaluations	iter	log_prob	dx	grad	alpha	alpha0	evals	ELBO	Best_ELBO
Float64	Int64	Float64	Int64	Int64	Float64	Float64	Float64	Float64	Float64	Int64	Float64	Float64
1	-8.68168	4	-7.10274	126	5	-7.638	0.0006345	1.28e-6	1.0	1.0	126	-7.161	-7.161
2	-8.73254	2	-7.12549	126	5	-7.638	0.0007266	1.625e-6	1.0	1.0	126	-7.211	-7.211
3	-8.23303	3	-7.19094	126	5	-7.638	0.0001351	8.472e-8	1.0	1.0	126	-7.192	-7.192
4	-14.0472	2	-7.18309	101	4	-7.638	0.01044	0.0001747	1.0	1.0	101	-7.217	-7.217
5	-10.9456	4	-7.19465	101	4	-7.638	0.007762	0.0002026	1.0	1.0	101	-7.195	-7.195
6	-8.37126	5	-7.19582	126	5	-7.638	0.0002376	2.277e-7	1.0	1.0	126	-7.196	-7.196

Copying and pasting a DataFrame from a Pluto notebook to Discourse doesn’t work really well, it looks much better in Pluto!

To create this Dataframe I split the profile output in lines and then parse.

I’m still unclear. what information are your trying to get and why?

Pathfinder is instrumented to capture the output of the LBFGS trajectory (for each individual pathfinder) - that’s in the .json file output by save_single_paths.
Do you want information about the PSIS resampling step?
Or?

Yes, In the cmdstan manual, chapter 6, it says:

The rest of the output describes the progression of the algorithm.

By default, the Pathfinder algorithm runs 4 single-path Pathfinders in parallel, the uses importance resampling on the set of returned draws to produce the specified number of draws.

The summary shown below those sentences, when num_threads > 1, ends up in my test case in a file bernoulli_log_1.log and is sometimes mingled. I was hoping to split that out in e.g. files bernoulli_log_1.log, bernoulli_log_2.log, etc with the profile_file=... keyword.

The command line I’m using is:

/Users/rob/Projects/Julia/notebooks/Stan/Pathfinder/tmp/bernoulli pathfinder init_alpha=0.001 tol_obj=9.99999999e-13 tol_rel_obj=1000.0 tol_grad=1.0e-8 tol_rel_grad=1.0e7 tol_param=1.0e-8 history_size=5 num_psis_draws=1000 num_paths=2 psis_resample=true calculate_lp=true save_single_paths=true max_lbfgs_iters=1000 num_draws=1000 num_elbo_draws=25 id=1 data file=/Users/rob/Projects/Julia/notebooks/Stan/Pathfinder/tmp/bernoulli_data_1.json init=2 random seed=196483075 output file=/Users/rob/Projects/Julia/notebooks/Stan/Pathfinder/tmp/bernoulli_chain_1.csv profile_file=/Users/rob/Projects/Julia/notebooks/Stan/Pathfinder/tmp/bernoulli_profile_1.log save_cmdstan_config=1 refresh=100 sig_figs=-1 num_threads=1

I’m not sure where the name bernoulli_log_1.log comes from. I expected it to be bernoulli_profile_1.log?

Sorry, forgot the why part of your question. Just want to be able to show that summary in a nicer format in the regular flow of a Pluto notebook.

I think what might be happening here is that in the c++ we use std::endl to finish writing where we may just want to append a \n to the string we want to log.

The way we do it right now is something like

stream << msg << std::endl;

If your not familiar with c++ the operator << is normally used to pass a message to a stream. The message being sent in one operation and the end line being sent in another is most likely causing the overlap you are seeing here.

I’d classify this as a bug and think we should just send append a \n to each message we send to the logger so we don’t have this line overlap. The new code could just be something like

stream << (msg + "\n");

the stuff we’re sending to the console - what part of the algorithm is this coming from? the PSIS resampling?

That output is from the iterations of lbfgs from within a single pathfinder

Thanks a lot Mitzi and Steve!

The insertion of a \n would solve my problem.

Your last question Mitzi also helps a lot because I now understand the summary form lbfgs is written to stdout and I need to handle that better.

you can also get the individual pathfinder LBFGS information in a structured form
using save_single_paths arg - 12 Pathfinder Method for Approximate Bayesian Inference | CmdStan User’s Guide.

We could but imo I think this is a problem for a lot of Stan’s output (this also happens in the multi chain case for NUTS). I think we should probably just stop using << std::endl; and just append \n to messages we log

1 Like

I see - I never scrolled far enough to the right to understand what the concern here was.
doh!

Mitzi, I’ll look into the save_single_path option a bit more. I did that after your initial response but couldn’t figure out how to recreate the “table”. I will take a closer look.

what part of the console output is truly informative?

I can understand that the JSON output seems daunting - it outputs information about the Hessian that’s useful to algorithm developers who might want to tweak things. the example that’s in the CmdStan Guide is from the centered 8-schools model, so Pathfinder goes pretty far into the funnel - the example shows the first few iterations - progress - and the final iterations - no progress (12 Pathfinder Method for Approximate Bayesian Inference | CmdStan User’s Guide)

This might be completely wrong but I was looking at it as another (cheaper?) approximation to the posterior distribution (next to MAP or MLE) and two parts in the summary may be useful as a first check if the pathfinder method was successful. The first part are the ELBO values in the 4 paths. A second part is the p>0.7 warning that sometimes shows up at the end of the standard output. Is that a reasonable way to use pathfinder?

using Pathfinder as an approximate algorithm is a fine thing to do.

@avehtari , @Lu.Zhang - from an end user perspective, what part of the current console output is most informative w/r/t to tracking progress and flagging problems?

2 Likes

If Pareto-\hat{k} < 0.7, then the approximation is so good that Pareto smoothed importance sampling can provide arbitrary accuracy when the number of draws increases. This can happen with low dimensional posteriors or moderate dimensional posteriors that are close to normal.

If Pareto-\hat{k} > 0.7, then we can’t get arbitrary accuracy, but the normal approximation can still be useful to learn something about the posterior and model predictions, or can be used to initialize MCMC. See Birthdays workflow example for challenging posteriors (it’s in R, but you can just look through the results)

2 Likes