Sampling chains from the middle in cmdstanpy

nerpa · October 8, 2020, 2:40pm

Hello all,

I am trying to implement checkpointing in cmdstanpy and I want to make sure I am passing the right arguments to the next cycle. Based on these examples

github.com

ahartikainen/fit_check_fit_loop/blob/master/MoreSamples_round1/increase_draws_in_steps.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Example how to iteratively get more draws (on PyStan)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pystan # current develop branch (2019-06-28)\n",
    "import arviz as az # current master branch (2019-06-28)\n",
    "\n",
    "from tqdm import tqdm_notebook as tqdm\n",
    "import numpy as np\n",

This file has been truncated. show original

and

one should pass the last interation value as init, the stepsize__ value as the step_size and turn adapt_engaged to False.
But what would be the CmdStanPy equivalent to PyStan inv_metric??

Thanks!

nerpa · October 8, 2020, 5:43pm

Ok, from a lot of digging, I think that cmdstanpy metric is the inv_metric equivalent.
Is it possible to calculate it from the cmdstan output.csv file?

ahartikainen · October 8, 2020, 6:37pm

Hi, in sample docstring

:param metric: Specification of the mass matrix, either as a
            vector consisting of the diagonal elements of the covariance
            matrix ('diag' or 'diag_e') or the full covariance matrix
            ('dense' or 'dense_e').
            If the value of the metric argument is a string other than
            'diag', 'diag_e', 'dense', or 'dense_e', it must be
            a valid filepath to a JSON or Rdump file which contains an entry
            'inv_metric' whose value is either the diagonal vector or
            the full covariance matrix.
            If the value of the metric argument is a list of paths, its
            length must match the number of chains and all paths must be
            unique.

Also the fit object has metric and stepsizemethods.

Then you just need to unpack metric items to individual files and have a list of files.

nerpa · October 8, 2020, 6:40pm

Thank you! But is it possible to calculate it from the output file?

mitzimorris · October 8, 2020, 6:41pm

here’s the link to the docs: https://mc-stan.org/cmdstanpy/api.html#cmdstanpy.CmdStanModel.sample

CmdStan docs have a (extremely simple) example: https://mc-stan.org/docs/2_24/cmdstan-guide/mcmc-config.html#specifying-the-metric-and-stepsize

nerpa · October 8, 2020, 6:51pm

Thank you! Just to make sure - all I need to pass to next cycles is init, metric and step_size (If all other parameters were set to their defaults)?

ahartikainen · October 8, 2020, 6:54pm

Also adapt_engaged=False and add seed too. (Also warmup_iters=0). Then test that it actually works.

nerpa · October 8, 2020, 6:56pm

Yes, thanks! I didn’t know about the seed – does it have to be the same seed for all cycles (including warmup)?

nerpa · October 8, 2020, 6:59pm

I just saw in the example that the seed is increased by 1 in every iteration. What is the reason for this?

ahartikainen · October 8, 2020, 7:36pm

Changes random numbers used.

mitzimorris · October 8, 2020, 9:09pm

not sure if this applies to this use case - when running multiple chains, recommended procedure is to use same seed and use chain id to advance RNG - (cf. 9 MCMC Sampling using Hamiltonian Monte Carlo | CmdStan User’s Guide, section “Running multiple chains with a specified RNG seed”)

Topic		Replies	Views
Checkpointing with CmdStanPy General	2	550	September 24, 2020
Resume sampling after interuption Interfaces cmdstanpy	5	1342	May 12, 2020
Transitioning from pystan 2.x to 3.x PyStan	3	506	November 30, 2023
Sampler parameters which the typical user might need to set (hmc_nuts_diag_e_adapt only) Developers	18	1344	December 3, 2019
CmdStanPy sampler arguments - adaptation parameters Interfaces	0	450	June 14, 2019

Sampling chains from the middle in cmdstanpy

Related topics