CmdStanPy Variational - How to get density parameters?

madelynandersen · November 17, 2024, 8:00pm

Hi! Here is the smallest reproducible code unit. I am working on the following fully conjugate normal model:

Y \sim \mathcal{N}(\mu, \sigma^2)\\ \mu \sim \mathcal{N}(\mu_0, \sigma_0^2)

In my toy model, I am setting

\mu_0 = 0, \sigma_0 = 1

and creating a toy dataset by drawing 1000 samples from a

\mathcal{N}(0.3, 10)

model. I am trying to compare the true posterior density defined by my toy samples and prior to the posterior density output by stan’s variational run on this data.

I am able to get the posterior mean from the created CmdStanVB object directly, but there does not seem to be any way to get the posterior variance other than calling np.var on the posterior samples returned by the CmdStanVB object. I know that the converged posterior variance from the ADVI algorithm is likely slightly different, by randomness, than the sample variance of the posterior samples but I need the ADVI’s posterior variance value at convergence and not an estimation.

How can I get the actual ADVI posterior variance? I am eventually going to need this for other fully conjugate models and I’m concerned after learning that this was possibly removed 6 years ago.

Here is my reproducible code:

When including Stan code in your post it really helps if you make it as readable as possible by using Stan code chunks (```stan) with clear spacing and indentation. For example, use

// The input data is a vector 'y' of length 'N'.
data {
  int<lower=0> N;
  vector[N] y;
  real<lower=0> sigma;
  real mu_0;
  real<lower=0> sigma_0;
}

// The parameters accepted by the model. Our model
// accepts two parameters 'mu' and 'sigma'.
parameters {

  real mu;
}

// The model to be estimated. We model the output
// 'y' to be normally distributed with mean 'mu'
// and standard deviation 'sigma'.
model {
  
  mu ~ normal(mu_0, sigma_0);
  
  y ~ normal(mu, sigma);
}

import os
from cmdstanpy import CmdStanModel
import numpy as np

# STAN_MODEL = path_to_stan_model

# set the parameters we will work with
n = 1000
mu = 0.3
sigma = 10
mu_0 = 0
sigma_0 = 1
NP_SEED = 10

stan_model = CmdStanModel(stan_file=STAN_MODEL)

# fix np seed and draw data
np.random.seed(NP_SEED)
y = np.random.normal(
            loc = mu,
            scale = sigma,
            size = n)

stan_data = {
        'N': n,
        'y': y,
        'sigma': sigma,
        'mu_0': mu_0,
        'sigma_0': sigma_0
        }

vb_fit = stan_model.variational(
            data = stan_data, 
)
# vb_fit.mu returns the posterior mean, but nothing returns the posterior variance.

One of the reasons I’m choosing to use variational inference is that it directly gives me the density of the posterior rather than samples from MCMC. I fear, however, that cmdstanpy’s variational inference implementation is not going to work in that regard. Is there any way to get the ADVI posterior variance other than calling np.var on the posterior samples?

Thanks in advance!

mitzimorris · November 18, 2024, 3:10am

We would need to add a summary method to the CmdStanVB object and this would use CmdStan’s stansummary method to estimate the variance. Note that the VB underestimates the true variance; by how much depends on dimensionality and amount of correlation between parameters.

WardBrian · November 18, 2024, 2:29pm

If I understand your question, the answer is no, this information is not currently preserved anywhere. Even @mitzimorris’ suggestion would be more or less equivalent to what you describe of calling np.var on everything.

If you’re interested in the approximate distribution (not the samples), we’re only giving you the mean, not the correlations. We could save this content, which would be similar to recent-ish features to save the Hessian from laplace approximation or when we improved how the metric from HMC is saved:

github.com/stan-dev/stan

use json-writer to write hmc tuning params

opened 12:55AM - 24 Apr 23 UTC

closed 06:50PM - 31 Oct 23 UTC

mitzimorris

feature

#### Summary: Produce the metric and stepsize as a JSON output file instead of …as comments embedded in CSV. #### Description: Using the implementation in stan-dev/stan/issues/3176, add methods which use `json_writer` to write metric and stepsize. #### Current Output: Using example model `examples/bernoulli/bernoulli.stan` (which has a single parameter), the output of a run of the NUTS-HMC sampler, adaptation on, diagonal metric, is: ``` # Adaptation terminated # Step size = 0.885758 # Diagonal elements of inverse mass matrix: # 0.489379 ``` #### Expected Output: JSON object: ``` { "stepsize" : 0.885758, "diagnoal_metric" : [ 0.489379 ] } ``` #### Current Version: v2.32.0

github.com/stan-dev/stan

Add an argument to write Hessian from Laplace algorithm to a file

opened 04:37PM - 25 Jan 24 UTC

closed 08:52PM - 30 Jan 24 UTC

avehtari

Laplace algorithm computes Hessian at the mode, but it is not available for the …user. The Hessian would be useful for - computing the normalization term of the normal approximation, which would help to estimate marginal likelihood (we have a use case) - computing normal approximation based sd and SE without drawing from the normal distribution, which would be useful for very big data and close to normal posterior - for algorithm development for algorithms that start with mode and Hessian, like CCD

The thing to write depends on whether you’re doing meanfield or fullrank:

github.com

stan-dev/stan/blob/9de06d41d0de9201a78c6c3e1de18fbc98496f1d/src/stan/variational/families/normal_fullrank.hpp#L31


      
          private:
           /**
            * Mean vector.
            */
           Eigen::VectorXd mu_;
          
           /**
            * Cholesky factor of covariance:
            * Sigma = L_chol *  L_chol.transpose()
            */
           Eigen::MatrixXd L_chol_;
          
           /**
            * Dimensionality of distribution.
            */
           const int dimension_;
          
           /**
            * Raise a domain exception if the specified vector contains
            * not-a-number values.
            *

github.com

stan-dev/stan/blob/9de06d41d0de9201a78c6c3e1de18fbc98496f1d/src/stan/variational/families/normal_meanfield.hpp#L30


      
          class normal_meanfield : public base_family {
           private:
            /**
             * Mean vector.
             */
            Eigen::VectorXd mu_;
          
            /**
             * Log standard deviation (log scale) vector.
             */
            Eigen::VectorXd omega_;
          
            /**
             * Dimensionality of distribution.
             */
            const int dimension_;
          
           public:
            /**
             * Construct a variational distribution of the specified
             * dimensionality with a zero mean and zero log standard

madelynandersen · November 18, 2024, 3:46pm

I see. So if I’m understanding correctly, my best bet is to either alter the c++ code to return the covariance or to take the np.var of the posterior variational samples and use that approximation as my density variance approximation.

To clarify, the available posterior variational samples are all drawn from the convergent density parameters, correct?

Thank you to you both for these detailed answers! I have a much better understanding of what is happening with cmdstanpy variational now.

WardBrian · November 18, 2024, 4:12pm

Correct. As far as modifying the C++, I’ve opened a ticket to request that functionality: ADVI: Option to save family parameters to JSON · Issue #3315 · stan-dev/stan · GitHub

Note that as @mitzimorris said, use these estimates with care, as they’re more or less known to be incorrect for wide families of problems.

Yes. After optimization, the algorithm draws the requested number of draws from a standard normal and then transforms them using the learned mean and covariance

Bob_Carpenter · November 18, 2024, 6:59pm

Why do you want the parameters of the approximate density? They’re not directly useful for inference because they are on the unconstrained scale. For example, a parameter sigma declared with a <lower=0> constraint is going to be log transformed in the approximate density, and a covariance matrix will be replaced with its Cholesky factor with a log-transformed diagonal, and so on.

The sample returned by ADVI will be created by (1) taking normal draws on the unconstrained scale, and (2) applying the constraining transform. If you want to perform downstream inference, you can just use these in a plug-in manner. If you want to try to use the unconstrained normal approximation for inference, you will need to do the transform, but any non-linear transform won’t transform expectations, just draws, because f(\mathbb{E}[\theta \mid y]) \neq \mathbb{E} [f(\theta) \mid y] if f is non-linear.

madelynandersen · November 18, 2024, 9:11pm

I like the idea of making the inferred posterior (co)-variances savable somehow. While they may not be useful for inference, they are still providing additional information about the variational inference output. One follow-up question,

If my CmdStanVB object is named vb_fit, there is a note on the stan_variable() function that the default behavior will soon be changed to have mean=False, I have noted that

np.mean(vb_fit.stan_variable(‘mu’, mean=False))

and

vb_fit.stan_variable(‘mu’, mean=True)

are returning different values. Is a weighted mean being computed for the mean=True parameter?

Also, to clarify for myself, I thought that variational bayes would return posterior values for my mu_0 and sigma_0 prior values, but instead I’m getting 1000 “variational samples.” Why is that? What specifically is being monte carlo approximated to give me sample outputs?

Thank you again for this enlightening discussion!

Bob_Carpenter · November 18, 2024, 9:19pm

I’m curious as to what you’d use it for. The reason it’s not in Stan is that we can’t figure out what it would be useful for. Maybe comparing two runs of variational inference? It wouldn’t be that hard to instrument our VI methods to save this, but we’d need some motivation.

Our doc for stan_variable() in VB is confusing. I would not trust using mean=True, which indicates that you get the approximate variational mean. We don’t know how to extract a variational mean for a constrained parameter, so I have no idea what this is doing, nor would I trust it, for parameters that have constraints. The mean of a non-linear transform is not the non-linear transform of the mean.

madelynandersen · November 18, 2024, 9:33pm

We are, actually, trying to compare multiple runs of variational inference (something something variational inference theory something something). The issue I’m running into is that the stan implementation of variational inference works under monte carlo approximations that I have yet to find clear explanations of or justification for beyond discovering that this is what was happening via the reference manual and trial and error.

I agree that the doc is confusing. Then what is the stan_variable mean=True returning? I also agree that non-linear transforms do cause issues with expected values and functions.

WardBrian · November 18, 2024, 10:01pm

It’s just giving you the value from the first row of the output csv file

It’s also entirely expected that taking the mean of the rest of the rows (which are samples centered at that point) would give you something different, due to the sampling

Bob_Carpenter · November 19, 2024, 8:27pm

I’ll channel for @andrewgelman and suggest using \widehat{R}. The draws should probably give you about as much accuracy as you need for that, though I agree that having the approximate normal used in the fit would be more elegant, though it’d let you measure the difference in unconstrained parameters, whereas what you probably want is to measure the difference in the constrained space.

Here are the two papers that explain all the details of our two autodiff systems:

ADVI: Automatic Differentiation Variational Inference
Pathfinder: Pathfinder: Parallel quasi-Newton variational inference

Both of them use the standard Monte Carlo estimate of the ELBO (negative KL divergence plus a constant). Variance in this estimator is one of the main reasons for differing behaviors under different runs of variational inference. It can be reduced with a larger sample size for the Monte Carlo estimate, but that’s expensive unless you’re running on GPU (which Stan cannot do).

Good question. I have no idea. The only way we have to compute the posterior mean on the constrained scale we care about is to generate draws on the unconstrained scale, transform, and calculate means. The problem is that the transform of a mean is not the mean of a transform when the transform is non-linear.

Topic		Replies	Views
Problems with posteriors of parameters in Bayesian hierarchical model (cmdstanpy) Modeling cmdstanpy , hierarchical-model , python	1	595	August 22, 2023
Extract Variational Posterior estimates Modeling	12	1968	May 21, 2020
Cmd stan 2.18 Developers	9	633	August 15, 2018
Parameter Estimation using ADVI General	1	582	August 14, 2018
ADVI: Posteriors Algorithms	2	586	November 27, 2019

CmdStanPy Variational - How to get density parameters?

Related topics