Question regarding generated quantities matrix output from cmdstanr / posterior

Hello!

This is a follow-up to Faster / better loading of sampler diagnostics in cmdstanr?.

I’ve fit a model where one of the parameters of interest is a hierarchical GP that has a varying effect across a predictor grid d according to a variable typgrp.


parameters {
  matrix[n_d, n_typgrp] GP_typgrp_std;  // standard mvnorm
  real<lower=0> alpha_typgrp;             // marginal deviance
  real<lower=0> rho_typgrp;               // length-scale
...
}

transformed parameters {
...
  matrix[n_d, n_t] GP_typgrp;      // typgrp GPs

  // typgrp-level non-centered Gaussian Processes (mvnorm std * L_cov)
  {

    matrix[n_d, n_d] cov_typgrp;                      // typgrp covariance  
    matrix[n_d, n_d] L_cov_typgrp;                    // lower tri

    cov_typgrp = cov_exp_quad(ds, alpha_typgrp, rho_typgrp)
              + diag_matrix(rep_vector(1e-9, n_d));    // avoid float errors

    L_cov_typgrp = cholesky_decompose(cov_typgrp);

    GP_typgrp = L_cov_typgrp * GP_typgrp_std;                   // non-centered
  }
...
}

I can successfully fit the model in cmdstanr, and I can recover the parameters and everything looks great - my variable GP_typgrp varies across d by typgrp as expected. Very happy!

As suggested in my previous post, though, for the sake of efficiency, I’ve split out the generated quantities block into a separate .stan program, and this is where things fall apart. My generated quantities program is identical to the model program minus the model block and with the following gq block:

generated quantities {
matrix[n_d, n_t] eta_new;

  eta_new =  GP_typgrp;
}

(this is a simplified version for illustration)

My call to the gq_program is as follows, where the variable fit is the sampled CmdStanMCMC object:


gq_program <- file.path(..., "gq_typgrp.stan")

mod_gq <- cmdstan_model(gq_program)

gq <- mod_gq$generate_quantities(
  fit,
  data = data,
  parallel_chains = 4
)

When I run the generated quantities program, the resulting matrix eta_new is a matrix with only the first column repeated n_t times. That is to say that the value of eta_new over d is identical across typgrp - different than the GP_typgrp variable recovered as a posterior sample in the model program.

I’m using the following {posterior} + {tidybayes} workflow to look at the draws:

post_draws <- gq$draws()

eta_draws_df <- post_draws %>%
  posterior::subset_draws(variable = "eta_new") %>%
  posterior::as_draws_df() %>%
  tidybayes::gather_variables() %>%
  ungroup() %>%
  mutate(
    d = as.integer(
      stringr::str_extract(.variable, "(?<=\\[)[:digit:]+")
    ),
    typgrp = as.integer(
      stringr::str_extract(.variable, "(?<=,)[:digit:]+")
    ),
    .variable = "eta"
  ) %>%
  arrange(d, .draw)

I think I must be misunderstanding something basic about how generated quantities handles / accepts matrices, or am missing something else. I can’t post the data / full model, but can try to make a reproducible example if no obvious operator error jumps out. Any advice is appreciated, with thanks!

[edited to add relevant parameters declaration and figures]

1 Like

Oh that’s not good.

So this basic Stan model:

parameters {
  matrix[2, 2] A;
}

transformed parameters {
  matrix[2, 2] B = A;
}

model {
  to_vector(A) ~ normal(0, 1);
}

And then this generated quantities block:

generated quantities {
  matrix[2, 2] C;
  C = B;
}

Does this look like the same thing you’re doing?

If so we can check the four values of C to see if they’re the same and maybe narrow this down a bit.

@bbbales2 thanks for looking at this. I can’t reproduce this with this simple example, so maybe there is something else going on with my code. I’ll try to have a deeper look myself and report back.

Probably the easiest way to go on this will be:

  1. Make your model as simple as possible so it goes fast (sample all parameters normal(0, 1) and get rid of data)

  2. Sequentially comment out complexity in transformed parameters

  3. Sequentially comment out complexity in GQs

Eventually you will end up with the model above and presumably at some point in the middle there will be the answer.

2 Likes

Thanks for the lead! I’ll post what I find here.

1 Like