Question regarding generated quantities matrix output from cmdstanr / posterior

nerutenbeck · November 5, 2020, 9:19pm

Hello!

This is a follow-up to Faster / better loading of sampler diagnostics in cmdstanr?.

I’ve fit a model where one of the parameters of interest is a hierarchical GP that has a varying effect across a predictor grid d according to a variable typgrp.


parameters {
  matrix[n_d, n_typgrp] GP_typgrp_std;  // standard mvnorm
  real<lower=0> alpha_typgrp;             // marginal deviance
  real<lower=0> rho_typgrp;               // length-scale
...
}

transformed parameters {
...
  matrix[n_d, n_t] GP_typgrp;      // typgrp GPs

  // typgrp-level non-centered Gaussian Processes (mvnorm std * L_cov)
  {

    matrix[n_d, n_d] cov_typgrp;                      // typgrp covariance  
    matrix[n_d, n_d] L_cov_typgrp;                    // lower tri

    cov_typgrp = cov_exp_quad(ds, alpha_typgrp, rho_typgrp)
              + diag_matrix(rep_vector(1e-9, n_d));    // avoid float errors

    L_cov_typgrp = cholesky_decompose(cov_typgrp);

    GP_typgrp = L_cov_typgrp * GP_typgrp_std;                   // non-centered
  }
...
}

I can successfully fit the model in cmdstanr, and I can recover the parameters and everything looks great - my variable GP_typgrp varies across d by typgrp as expected. Very happy!

As suggested in my previous post, though, for the sake of efficiency, I’ve split out the generated quantities block into a separate .stan program, and this is where things fall apart. My generated quantities program is identical to the model program minus the model block and with the following gq block:

generated quantities {
matrix[n_d, n_t] eta_new;

  eta_new =  GP_typgrp;
}

(this is a simplified version for illustration)

My call to the gq_program is as follows, where the variable fit is the sampled CmdStanMCMC object:


gq_program <- file.path(..., "gq_typgrp.stan")

mod_gq <- cmdstan_model(gq_program)

gq <- mod_gq$generate_quantities(
  fit,
  data = data,
  parallel_chains = 4
)

When I run the generated quantities program, the resulting matrix eta_new is a matrix with only the first column repeated n_t times. That is to say that the value of eta_new over d is identical across typgrp - different than the GP_typgrp variable recovered as a posterior sample in the model program.

I’m using the following {posterior} + {tidybayes} workflow to look at the draws:

post_draws <- gq$draws()

eta_draws_df <- post_draws %>%
  posterior::subset_draws(variable = "eta_new") %>%
  posterior::as_draws_df() %>%
  tidybayes::gather_variables() %>%
  ungroup() %>%
  mutate(
    d = as.integer(
      stringr::str_extract(.variable, "(?<=\\[)[:digit:]+")
    ),
    typgrp = as.integer(
      stringr::str_extract(.variable, "(?<=,)[:digit:]+")
    ),
    .variable = "eta"
  ) %>%
  arrange(d, .draw)

I think I must be misunderstanding something basic about how generated quantities handles / accepts matrices, or am missing something else. I can’t post the data / full model, but can try to make a reproducible example if no obvious operator error jumps out. Any advice is appreciated, with thanks!

[edited to add relevant parameters declaration and figures]

bbbales2 · November 6, 2020, 1:44pm

Oh that’s not good.

So this basic Stan model:

parameters {
  matrix[2, 2] A;
}

transformed parameters {
  matrix[2, 2] B = A;
}

model {
  to_vector(A) ~ normal(0, 1);
}

And then this generated quantities block:

generated quantities {
  matrix[2, 2] C;
  C = B;
}

Does this look like the same thing you’re doing?

If so we can check the four values of C to see if they’re the same and maybe narrow this down a bit.

nerutenbeck · November 6, 2020, 1:57pm

@bbbales2 thanks for looking at this. I can’t reproduce this with this simple example, so maybe there is something else going on with my code. I’ll try to have a deeper look myself and report back.

bbbales2 · November 6, 2020, 2:01pm

Probably the easiest way to go on this will be:

Make your model as simple as possible so it goes fast (sample all parameters normal(0, 1) and get rid of data)
Sequentially comment out complexity in transformed parameters
Sequentially comment out complexity in GQs

Eventually you will end up with the model above and presumably at some point in the middle there will be the answer.

nerutenbeck · November 6, 2020, 2:02pm

Thanks for the lead! I’ll post what I find here.

Topic		Replies	Views
Fewer generated quantities than parameter samples? Interfaces cmdstanr	5	495	November 11, 2020
Standalone generated quantities - comments welcome Developers features	32	2991	May 1, 2019
Trouble loading cmdstanr csv files with generated quantities in stan model General	0	289	December 22, 2022
Getting generated quantities variable in original dimensions Modeling rstan , fitting-issues	7	464	June 28, 2022
Problem with generated quantities in cmdstanpy General fitting-issues , cmdstanpy , cognitive-science	6	546	October 25, 2024

Question regarding generated quantities matrix output from cmdstanr / posterior

Related topics