Hello!
This is a follow-up to Faster / better loading of sampler diagnostics in cmdstanr?.
I’ve fit a model where one of the parameters of interest is a hierarchical GP that has a varying effect across a predictor grid d
according to a variable typgrp.
parameters {
matrix[n_d, n_typgrp] GP_typgrp_std; // standard mvnorm
real<lower=0> alpha_typgrp; // marginal deviance
real<lower=0> rho_typgrp; // length-scale
...
}
transformed parameters {
...
matrix[n_d, n_t] GP_typgrp; // typgrp GPs
// typgrp-level non-centered Gaussian Processes (mvnorm std * L_cov)
{
matrix[n_d, n_d] cov_typgrp; // typgrp covariance
matrix[n_d, n_d] L_cov_typgrp; // lower tri
cov_typgrp = cov_exp_quad(ds, alpha_typgrp, rho_typgrp)
+ diag_matrix(rep_vector(1e-9, n_d)); // avoid float errors
L_cov_typgrp = cholesky_decompose(cov_typgrp);
GP_typgrp = L_cov_typgrp * GP_typgrp_std; // non-centered
}
...
}
I can successfully fit the model in cmdstanr
, and I can recover the parameters and everything looks great - my variable GP_typgrp varies across d
by typgrp
as expected. Very happy!
As suggested in my previous post, though, for the sake of efficiency, I’ve split out the generated quantities block into a separate .stan
program, and this is where things fall apart. My generated quantities program is identical to the model program minus the model block and with the following gq block:
generated quantities {
matrix[n_d, n_t] eta_new;
eta_new = GP_typgrp;
}
(this is a simplified version for illustration)
My call to the gq_program is as follows, where the variable fit
is the sampled CmdStanMCMC
object:
gq_program <- file.path(..., "gq_typgrp.stan")
mod_gq <- cmdstan_model(gq_program)
gq <- mod_gq$generate_quantities(
fit,
data = data,
parallel_chains = 4
)
When I run the generated quantities program, the resulting matrix eta_new
is a matrix with only the first column repeated n_t
times. That is to say that the value of eta_new over d
is identical across typgrp - different than the GP_typgrp variable recovered as a posterior sample in the model program.
I’m using the following {posterior}
+ {tidybayes}
workflow to look at the draws:
post_draws <- gq$draws()
eta_draws_df <- post_draws %>%
posterior::subset_draws(variable = "eta_new") %>%
posterior::as_draws_df() %>%
tidybayes::gather_variables() %>%
ungroup() %>%
mutate(
d = as.integer(
stringr::str_extract(.variable, "(?<=\\[)[:digit:]+")
),
typgrp = as.integer(
stringr::str_extract(.variable, "(?<=,)[:digit:]+")
),
.variable = "eta"
) %>%
arrange(d, .draw)
I think I must be misunderstanding something basic about how generated quantities handles / accepts matrices, or am missing something else. I can’t post the data / full model, but can try to make a reproducible example if no obvious operator error jumps out. Any advice is appreciated, with thanks!
[edited to add relevant parameters declaration and figures]