Pareto k value for pathfinder

fdvanleeuwen · April 29, 2025, 9:20am

The pathfinder algorithm prints a Pareto k value when it is done running. I am unsure how this value is calculated and if it can be accessed in the model object.

When the pathfinder algorithm is used, a Pareto k value is automatically printed:

# fitting a linear regression model with pathfinder
m_pf = m_test_model$pathfinder(data = m_test_data)

"Pareto k value (2.1) is greater than 0.7. Importance resampling was not able to improve the approximation, which may indicate that the approximation itself is poor. "

This can also be calculated for Laplace or VI (see Birthday example). To obtain the weights, and subsequently the Pareto k value we use the following code:

# fitting models
m_lp = m_test_model$laplace(data = m_test_data,)
m_mf = m_test_model$variational(data = m_test_data,algorithm = "meanfield")

# for laplace
m_lp$draws() |>
  mutate_variables(lw = lp__-lp_approx__, w=exp(lw-max(lw))) |>
  subset_draws(variable="w") |>
  summarise_draws(pareto_khat, .args = list(tail='right', extra_diags = TRUE))

# variable pareto_khat
# w	       15.22024	

# for meanfield VI
m_mf$draws() |>
  mutate_variables(lw = lp__-lp_approx__, w=exp(lw-max(lw))) |>
  subset_draws(variable="w") |>
  summarise_draws(pareto_khat, .args = list(tail='right', extra_diags = TRUE))

# variable pareto_khat
# w	       1.58

“Where lp__ is the unnormalized log density on Stan’s unconstrained space and lp_approx__ is the unnormalized density of the Laplace approximation (for variational inference lp_approx__ is the log density of the variational approximation to lp__).”
From the cmdstanr manual.

When trying the same procedure for the pathfinder algorithm, I get the following error:

m_pf$draws() |>
  mutate_variables(lw = lp__-lp_approx__, w=exp(lw-max(lw))) |>
  subset_draws(variable="w") |>
  summarise_draws(pareto_khat, .args = list(tail='right', extra_diags = TRUE))

# Warning: Input contains infinite or NA values, is constant, or has constant tail. Fitting of generalized Pareto distribution not performed

# variable pareto_khat
# w	       NA

Questions

Is the Pareto k value from the pathfinder algorithm calculated in the same way as the code above?

If not, what is the difference? And how can the Pareto k value from pathfinder be accessed from the fitted model?

Below is the code to generate the data and specify the model used in the example:

# generate some data
Ntrain = 1000
P = 9
R2 = 0.5
rho = 0.1

# generate covariance matrix 
V <- rho + (1-rho) * diag(P)

# we generate 15 parameters of which 5 are negative, 5 positive and 5 zero
b = c(rep(-10,P/3),
      rep(0,P/3),
      rep(10,P/3))

# generate random noise, add the covariance structure 
set.seed(123)
Xtrain <- matrix(rnorm(Ntrain * (P)), nrow = Ntrain) %*% chol(V)

# generate the data, based on the R2 value
Ytrain <- Xtrain %*% b + rnorm(Ntrain, sd = 2)   

df_train = data.frame(X = Xtrain, Y = Ytrain)

# specify model

# horseshoe prior
priors_self <- set_prior(horseshoe(df = 1, par_ratio = 2/3))

# obtain model for cmdstanr
m_test_code = stancode(Y ~ ., data = df_train, backend = "cmdstanr")
m_test_data = standata(Y ~ ., data = df_train, backend = "cmdstanr")

m_test_model = cmdstanr::cmdstan_model(cmdstanr::write_stan_file(m_test_code))

avehtari · April 29, 2025, 4:00pm

Pareto k value reported by CmdStan is computed with equivalent code in C++.

By default, CmdStan also does resampling with replacement based on the weights, and thus $draws() is likely to return many copies of the draw having the highest weight. If there are many copies of the draw with the highest weight, then the tail of the weight distribution is constant. The warning message you did get mentioned also this possibility: “Input contains infinite or NA values, is constant, or has constant tail.” You can’t recover the original Pareto-k from the resampled draws.

You can run the Pathfinder algorithm without resampling, by using argument psis_resample=FALSE. Then you can compute Pareto-k with the R code you used for Laplace and VI.

I understand the behavior can be confusing as Pathfinder is the only one doing the resampling inside CmdStan and has that as the default behavior.

Topic		Replies	Views
Pathfinder pareto K Algorithms	2	268	February 8, 2024
Good PP check and R square but large Pareto k values Modeling performance , loo	10	2133	September 2, 2020
Using pathfinder to initalize sampling Modeling	11	231	November 19, 2024
Question about large Pareto k value Modeling loo	6	1117	September 13, 2022
Pareto values versus ELPD differences Modeling loo	5	1013	June 15, 2021

Pareto k value for pathfinder

Related topics