Extracting VB's precision matrix

I’m trying to extract the precision matrix of the unconstrained gaussian from CmdStan’s ADVI, and see if using it as an inv_metric can help to speed up warmup (by adjusting scales etc). Is there a way to do this directly? If not, is there a way to transform VB draws to the unconstrained space (and then compute it)?

Also, if anyone has tried this before, I’d be very interested in learning what worked well, what didn’t, etc.

1 Like

Hi Adam,
I wrote a very simple code to transform VB draws to the unconstrained space. It might fail if you have transformed parameters. unc_sam_util.R (816 Bytes) advi_unc_sam.R (676 Bytes)

I am not sure how to get inv_metric directly. Maybe @avehtari know who can help you.

Best,

Lu

2 Likes

Thanks @Lu.Zhang, that’s very helpful!

Naive question, but shouldn’t this be the inverse of the unconstrained draws covariance matrix? And, for a diagonal inv_metric, shouldn’t the (inverse) variances of the unconstrained draws be enough?

By default Stan does inverse metric = diagonal of unconstrained covariance.

There’s definitely a cmdstanr way to get it out. Not sure off the top of my head. This maybe: https://mc-stan.org/cmdstanr/reference/fit-method-metadata.html

1 Like

Naive question, but shouldn’t this be the inverse of the unconstrained draws covariance matrix? And, for a diagonal inv_metric , shouldn’t the (inverse) variances of the unconstrained draws be enough?

Yes, but this is a numerical approximation of the inv_metric.

My intention was to use this as an “initial guess” in a model for which posterior variances where quite varied in the scales; I wanted to see if this might speed up warmup (“manually” rescaling parameters worked very well). Does this makes sense?

Does this makes sense?

Sure!

There’s definitely a cmdstanr way to get it out

I use function vb() in RStan in the example code for getting ADVI draws. I just checked the output of $variational() method of a CmdStanModel object with $metadata() but didn’t find anything that might be inv_metric.

By the way, do you know whether vb() in RStan and $variational() method in CmdStanr are the same? If not then I probably need to redo all ADVI tests I have done.

2 Likes

Yeah we only have inv_metric in CmdStanR after MCMC. As far as I know CmdStan doesn’t give us access to anything like that for VB that we could expose in CmdStanR (but I could be wrong about that).

The only difference would be if there were changes to ADVI between the Stan version used by RStan and the latest version used by CmdStanR (RStan is still stuck on an older version of Stan). But I don’t think there have been any changes recently.

3 Likes

Good to know! Thank you so much!

1 Like

Thanks again for sharing this @Lu.Zhang! I’ve tried it and got

 Error in object@.MISC$stan_fit_instance$unconstrain_pars(pars) : 
  mismatch in number dimensions declared and found in context; processing stage=parameter initialization; variable name=beta; dims declared=(100,10); dims found=(1000) 

Beta is indeed declared as a matrix - should this be a problem?

I do - could this be related to the error above? I also had to comment out the entire GC block for vb to work, not sure if this is a general problem or not.

I see. My code reads in the samples of matrix beta as vectors. I am afraid that you need to hard code it. Maybe send me your code. I can check it for you later.

Found this in a GitHub issue (by @Jonah), and used it to modify unconstrain_sam:

unconstrain_vb <- function(model, data, vb_obj) {
  post <- to_posterior(model, data)

  tmp <- rstan:::create_skeleton(post@model_pars, post@par_dims)
  skeleton <- tmp[names(tmp) != "lp__"]
  constrained_draws <- as.matrix(sapply(vb_obj@sim$samples[[1]], unlist))
  S <- nrow(constrained_draws)
  unconstrained_draws <- sapply(seq_len(S), FUN = function(s) {
    constrained_s <- rstan:::rstan_relist(constrained_draws[s, ], skeleton)
    post@.MISC$stan_fit_instance$unconstrain_pars(constrained_s)
  })
  unconstrained_draws <- t(unconstrained_draws)
  unconstrained_draws
}

unconstrained_draws <- unconstrain_vb(model, data, fit_ADVI)
vb_unconstrained_sds <- apply(unconstrained_draws, 2, sd)

… and then use the inverse of vb_unconstrained_sds to initiliaze inv_metric.

I think this should work - seems OK so far, but haven’t thoroughly tested it yet.

4 Likes

This looks great! Thank you so much for sharing!

1 Like

inverse of vb_unconstrained_sds to initiliaze inv_metric

I think it should be variances are the inverse metric, but I always get confused with these. If something doesn’t work quite right just remember to try all variations and it’ll probably be pretty evident which is the right thing.

I think you’re right, this was the version where sampling was the fastest.

I’m getting something else which is unintuitive for me. While the VB-based-inv_metric samples significantly faster then the default inv_metric, I’m getting (for some parameters) significantly worse N_eff and Rhat-s (no divergences in either of them, no tree-depth saturation, no energy-related warning). My expectation was that sampling speed should be somewhat correlated with sampling diagnostics…

Any intuition for what might cause this?

Yeah that’s possible. There’s not a lot of sunlight for Rhats to be worse though. The latest rule of thumb is < 1.01 from the new Rhat paper (https://arxiv.org/abs/1903.08008).

So if they’re > 1.01 then good reason to think things aren’t mixing that well. And if you compare the ESS/s, then the faster chains probably aren’t faster.

You can also get the combined Neff/Rhats of two different mcmc runs with bind_draws in the posterior package. I think you need the extra argument along = "chain" or something. So then you if you trust the regular sampling run then you can see how it compares to the warmed-up-with-vb one.

You could also do vb warmup into hmc warmup into regular sampling and see how that goes. In that case you’re giving the regular warmup hopefully a better starting point and hopefully a better guess at the scales, and then maybe it has an easier job from there.

If you do that you’ll probably want to play with the window sizes and timestep adaptation. If you don’t make any changes there, then supplied metric will get tossed at either draw 75 or draw 100 (I forget). That might not be a problem (might be enough to get HMC started), I just wanted to mention it so you are aware.

By regular sampling you mean cmdstan_model$sampling with no inv_metric supplied? That was the thing I was comparing against. I’ve used this snippet, taken from one of the bayesplot vignettes:

compare_inv_unit <- function(inv_plot, unit_plot, ncol = 2, ...) {
  bayesplot_grid(
    inv_plot, unit_plot, 
    grid_args = list(ncol = ncol),
    subtitles = c("VB", 
                  "Unit"),
    ...
  )
}

compare_inv_unit(mcmc_rhat(rhat(rstan.inv)),mcmc_rhat(rhat(rstan.unit)))
compare_inv_unit(mcmc_neff(neff_ratio(rstan.inv)),mcmc_neff(neff_ratio(rstan.unit)))

VB warmup = run VB, estimate variances in unconstrained space, use that as an inv_metric for HMC? Not sure I understand the distinction between HMC and regular warmup, then.

That was my original intention, but maybe I’m confusing the terminology, here.

That’s interesting. Any references as to how to adjust these, given the VB draws?

Yup

With posterior you can actually take the draws from both methods and mix them together. It’s possible you have results from method1 and results from method2 and they both have good ESS but then when you mix them together you realize they are different things.

Isn’t meant to be one, my bad.

Oh okay. What are the two cmdstanr$sample commands? Like one has an inv_metric passed in w/ a starting point, but does it have other options changed? Or is it just the inv_metric and a starting point?

Just this:

fit.inv_metric <- model.cmd$sample(data=data,
                                   iter_warmup = 500,
                                   iter_sampling = 500,
                                   seed = 123,
                                   inv_metric = vb_unconstrained_vars,
                                   chains = 2,
                                   parallel_chains = 2,
                                   save_warmup = T)

and this is compared against fit.unit_metric which is the same but without the inv_metric part.