The pathfinder algorithm prints a Pareto k value when it is done running. I am unsure how this value is calculated and if it can be accessed in the model object.
When the pathfinder algorithm is used, a Pareto k value is automatically printed:
# fitting a linear regression model with pathfinder
m_pf = m_test_model$pathfinder(data = m_test_data)
"Pareto k value (2.1) is greater than 0.7. Importance resampling was not able to improve the approximation, which may indicate that the approximation itself is poor. "
This can also be calculated for Laplace or VI (see Birthday example). To obtain the weights, and subsequently the Pareto k value we use the following code:
# fitting models
m_lp = m_test_model$laplace(data = m_test_data,)
m_mf = m_test_model$variational(data = m_test_data,algorithm = "meanfield")
# for laplace
m_lp$draws() |>
mutate_variables(lw = lp__-lp_approx__, w=exp(lw-max(lw))) |>
subset_draws(variable="w") |>
summarise_draws(pareto_khat, .args = list(tail='right', extra_diags = TRUE))
# variable pareto_khat
# w 15.22024
# for meanfield VI
m_mf$draws() |>
mutate_variables(lw = lp__-lp_approx__, w=exp(lw-max(lw))) |>
subset_draws(variable="w") |>
summarise_draws(pareto_khat, .args = list(tail='right', extra_diags = TRUE))
# variable pareto_khat
# w 1.58
“Where lp__ is the unnormalized log density on Stan’s unconstrained space and lp_approx__ is the unnormalized density of the Laplace approximation (for variational inference lp_approx__ is the log density of the variational approximation to lp__).”
From the cmdstanr manual.
When trying the same procedure for the pathfinder algorithm, I get the following error:
m_pf$draws() |>
mutate_variables(lw = lp__-lp_approx__, w=exp(lw-max(lw))) |>
subset_draws(variable="w") |>
summarise_draws(pareto_khat, .args = list(tail='right', extra_diags = TRUE))
# Warning: Input contains infinite or NA values, is constant, or has constant tail. Fitting of generalized Pareto distribution not performed
# variable pareto_khat
# w NA
Questions
Is the Pareto k value from the pathfinder algorithm calculated in the same way as the code above?
If not, what is the difference? And how can the Pareto k value from pathfinder be accessed from the fitted model?
Below is the code to generate the data and specify the model used in the example:
# generate some data
Ntrain = 1000
P = 9
R2 = 0.5
rho = 0.1
# generate covariance matrix
V <- rho + (1-rho) * diag(P)
# we generate 15 parameters of which 5 are negative, 5 positive and 5 zero
b = c(rep(-10,P/3),
rep(0,P/3),
rep(10,P/3))
# generate random noise, add the covariance structure
set.seed(123)
Xtrain <- matrix(rnorm(Ntrain * (P)), nrow = Ntrain) %*% chol(V)
# generate the data, based on the R2 value
Ytrain <- Xtrain %*% b + rnorm(Ntrain, sd = 2)
df_train = data.frame(X = Xtrain, Y = Ytrain)
# specify model
# horseshoe prior
priors_self <- set_prior(horseshoe(df = 1, par_ratio = 2/3))
# obtain model for cmdstanr
m_test_code = stancode(Y ~ ., data = df_train, backend = "cmdstanr")
m_test_data = standata(Y ~ ., data = df_train, backend = "cmdstanr")
m_test_model = cmdstanr::cmdstan_model(cmdstanr::write_stan_file(m_test_code))