Extracting VB's precision matrix

Adam_Haber · December 9, 2020, 11:19am

I’m trying to extract the precision matrix of the unconstrained gaussian from CmdStan’s ADVI, and see if using it as an inv_metric can help to speed up warmup (by adjusting scales etc). Is there a way to do this directly? If not, is there a way to transform VB draws to the unconstrained space (and then compute it)?

Also, if anyone has tried this before, I’d be very interested in learning what worked well, what didn’t, etc.

Lu.Zhang · December 11, 2020, 3:32pm

Hi Adam,
I wrote a very simple code to transform VB draws to the unconstrained space. It might fail if you have transformed parameters. unc_sam_util.R (816 Bytes) advi_unc_sam.R (676 Bytes)

I am not sure how to get inv_metric directly. Maybe @avehtari know who can help you.

Best,

Lu

Adam_Haber · December 11, 2020, 3:52pm

Thanks @Lu.Zhang, that’s very helpful!

Naive question, but shouldn’t this be the inverse of the unconstrained draws covariance matrix? And, for a diagonal inv_metric, shouldn’t the (inverse) variances of the unconstrained draws be enough?

bbbales2 · December 11, 2020, 4:17pm

By default Stan does inverse metric = diagonal of unconstrained covariance.

There’s definitely a cmdstanr way to get it out. Not sure off the top of my head. This maybe: Extract metadata from CmdStan CSV files — fit-method-metadata • cmdstanr

Lu.Zhang · December 11, 2020, 4:24pm

Naive question, but shouldn’t this be the inverse of the unconstrained draws covariance matrix? And, for a diagonal inv_metric , shouldn’t the (inverse) variances of the unconstrained draws be enough?

Yes, but this is a numerical approximation of the inv_metric.

Adam_Haber · December 11, 2020, 7:15pm

My intention was to use this as an “initial guess” in a model for which posterior variances where quite varied in the scales; I wanted to see if this might speed up warmup (“manually” rescaling parameters worked very well). Does this makes sense?

Lu.Zhang · December 11, 2020, 9:06pm

Does this makes sense?

Sure!

Lu.Zhang · December 11, 2020, 9:40pm

There’s definitely a cmdstanr way to get it out

I use function vb() in RStan in the example code for getting ADVI draws. I just checked the output of $variational() method of a CmdStanModel object with $metadata() but didn’t find anything that might be inv_metric.

By the way, do you know whether vb() in RStan and $variational() method in CmdStanr are the same? If not then I probably need to redo all ADVI tests I have done.

jonah · December 11, 2020, 9:44pm

Yeah we only have inv_metric in CmdStanR after MCMC. As far as I know CmdStan doesn’t give us access to anything like that for VB that we could expose in CmdStanR (but I could be wrong about that).

The only difference would be if there were changes to ADVI between the Stan version used by RStan and the latest version used by CmdStanR (RStan is still stuck on an older version of Stan). But I don’t think there have been any changes recently.

Lu.Zhang · December 11, 2020, 10:16pm

Good to know! Thank you so much!

Adam_Haber · December 12, 2020, 2:01pm

Thanks again for sharing this @Lu.Zhang! I’ve tried it and got

 Error in object@.MISC$stan_fit_instance$unconstrain_pars(pars) : 
  mismatch in number dimensions declared and found in context; processing stage=parameter initialization; variable name=beta; dims declared=(100,10); dims found=(1000)

Beta is indeed declared as a matrix - should this be a problem?

I do - could this be related to the error above? I also had to comment out the entire GC block for vb to work, not sure if this is a general problem or not.

Lu.Zhang · December 12, 2020, 3:02pm

Adam_Haber:

Error in object@.MISC$stan_fit_instance$unconstrain_pars(pars) : 
  mismatch in number dimensions declared and found in context; processing stage=parameter initialization; variable name=beta; dims declared=(100,10); dims found=(1000)

I see. My code reads in the samples of matrix beta as vectors. I am afraid that you need to hard code it. Maybe send me your code. I can check it for you later.

Adam_Haber · December 12, 2020, 4:53pm

Found this in a GitHub issue (by @Jonah), and used it to modify unconstrain_sam:

unconstrain_vb <- function(model, data, vb_obj) {
  post <- to_posterior(model, data)

  tmp <- rstan:::create_skeleton(post@model_pars, post@par_dims)
  skeleton <- tmp[names(tmp) != "lp__"]
  constrained_draws <- as.matrix(sapply(vb_obj@sim$samples[[1]], unlist))
  S <- nrow(constrained_draws)
  unconstrained_draws <- sapply(seq_len(S), FUN = function(s) {
    constrained_s <- rstan:::rstan_relist(constrained_draws[s, ], skeleton)
    post@.MISC$stan_fit_instance$unconstrain_pars(constrained_s)
  })
  unconstrained_draws <- t(unconstrained_draws)
  unconstrained_draws
}

unconstrained_draws <- unconstrain_vb(model, data, fit_ADVI)
vb_unconstrained_sds <- apply(unconstrained_draws, 2, sd)

… and then use the inverse of vb_unconstrained_sds to initiliaze inv_metric.

I think this should work - seems OK so far, but haven’t thoroughly tested it yet.

Lu.Zhang · December 12, 2020, 5:00pm

This looks great! Thank you so much for sharing!

bbbales2 · December 12, 2020, 6:03pm

inverse of vb_unconstrained_sds to initiliaze inv_metric

I think it should be variances are the inverse metric, but I always get confused with these. If something doesn’t work quite right just remember to try all variations and it’ll probably be pretty evident which is the right thing.

Adam_Haber · December 14, 2020, 7:01am

I think you’re right, this was the version where sampling was the fastest.

I’m getting something else which is unintuitive for me. While the VB-based-inv_metric samples significantly faster then the default inv_metric, I’m getting (for some parameters) significantly worse N_eff and Rhat-s (no divergences in either of them, no tree-depth saturation, no energy-related warning). My expectation was that sampling speed should be somewhat correlated with sampling diagnostics…

Any intuition for what might cause this?

bbbales2 · December 14, 2020, 12:24pm

Yeah that’s possible. There’s not a lot of sunlight for Rhats to be worse though. The latest rule of thumb is < 1.01 from the new Rhat paper ([1903.08008] Rank-normalization, folding, and localization: An improved $\widehat{R}$ for assessing convergence of MCMC).

So if they’re > 1.01 then good reason to think things aren’t mixing that well. And if you compare the ESS/s, then the faster chains probably aren’t faster.

You can also get the combined Neff/Rhats of two different mcmc runs with bind_draws in the posterior package. I think you need the extra argument along = "chain" or something. So then you if you trust the regular sampling run then you can see how it compares to the warmed-up-with-vb one.

You could also do vb warmup into hmc warmup into regular sampling and see how that goes. In that case you’re giving the regular warmup hopefully a better starting point and hopefully a better guess at the scales, and then maybe it has an easier job from there.

If you do that you’ll probably want to play with the window sizes and timestep adaptation. If you don’t make any changes there, then supplied metric will get tossed at either draw 75 or draw 100 (I forget). That might not be a problem (might be enough to get HMC started), I just wanted to mention it so you are aware.

Adam_Haber · December 14, 2020, 12:44pm

By regular sampling you mean cmdstan_model$sampling with no inv_metric supplied? That was the thing I was comparing against. I’ve used this snippet, taken from one of the bayesplot vignettes:

compare_inv_unit <- function(inv_plot, unit_plot, ncol = 2, ...) {
  bayesplot_grid(
    inv_plot, unit_plot, 
    grid_args = list(ncol = ncol),
    subtitles = c("VB", 
                  "Unit"),
    ...
  )
}

compare_inv_unit(mcmc_rhat(rhat(rstan.inv)),mcmc_rhat(rhat(rstan.unit)))
compare_inv_unit(mcmc_neff(neff_ratio(rstan.inv)),mcmc_neff(neff_ratio(rstan.unit)))

VB warmup = run VB, estimate variances in unconstrained space, use that as an inv_metric for HMC? Not sure I understand the distinction between HMC and regular warmup, then.

That was my original intention, but maybe I’m confusing the terminology, here.

That’s interesting. Any references as to how to adjust these, given the VB draws?

bbbales2 · December 14, 2020, 1:11pm

Yup

With posterior you can actually take the draws from both methods and mix them together. It’s possible you have results from method1 and results from method2 and they both have good ESS but then when you mix them together you realize they are different things.

Isn’t meant to be one, my bad.

Oh okay. What are the two cmdstanr$sample commands? Like one has an inv_metric passed in w/ a starting point, but does it have other options changed? Or is it just the inv_metric and a starting point?

Adam_Haber · December 14, 2020, 1:14pm

Just this:

fit.inv_metric <- model.cmd$sample(data=data,
                                   iter_warmup = 500,
                                   iter_sampling = 500,
                                   seed = 123,
                                   inv_metric = vb_unconstrained_vars,
                                   chains = 2,
                                   parallel_chains = 2,
                                   save_warmup = T)

and this is compared against fit.unit_metric which is the same but without the inv_metric part.

Topic		Replies	Views
CmdStanPy Variational - How to get density parameters? Modeling techniques , fitting-issues , variational-bayes , covariance , cmdstanpy	10	65	November 19, 2024
Extracting step size per chain in cmdstanr Interfaces	7	952	January 13, 2021
Saved mass matrix General	22	2699	December 12, 2022
Is it possible to access mean and the covariance of vb? General	2	371	September 13, 2019
Extracting the mass matrix (Euclidean metric) in rstan RStan	5	1092	June 16, 2020

Extracting VB's precision matrix

Related topics