Conditional generated quantities in Stan aggregate model to handle different frequentist functional forms

brijo · June 9, 2021, 11:08pm

For a current set of projects, I use a single set of aggregate Stan models, in the vein of the “eight schools example.” The models take as data one or more treatment estimates from one or more RCTs at different partner sites, never student-level data. Treatment estimates come from frequentist models in which the functional form depends on the outcome (e.g. Poisson model for number of days absent, linear models for test score outcomes, etc.). Because the functional form can differ for what gets passed in to Stan, I want to specify generated quantities that are conditional on that functional form.

For example, Poisson treatment estimates get passed to the model untransformed, so in generated quantities I would like to perform the typical transformation to get percent change: exp(\beta)-1. But of course if estimates passed in come from a linear model, this generated quantity isn’t needed.

From the Stan reference manual section 5.3, I know of “Declaring Optional Variables,” which is the basis of my solution below. Though I’d really love to be able to conditionally generate bt_trans as a real type, not a vector, because that’s all I really need. Is that not possible? Previous posts left this unclear to me. Any new insights are much appreciated.

(This model for a single estimate from a single partner site is to illustrate my question. For multiple partner sites running multiple treatment arms, the model is more complex, so that we can partially pool across sites. In such scenarios, there are many more generated quantities of interest, all of which again differ depending on the functional form of the frequentist model used to estimate treatment effects.)

data {
  // Number of partner sites
  int<lower=1> P; // Just set to 1 in this case
  // Treatment estimate
  real y;
  // Standard error of estimate
  real<lower=0> sigma;
  // Prior on sd of treatment effect
  real<lower=0> prior_full_sd;
  // Prior mean on overall treatment effect
  real prior_mu;
  // Poisson functional form
  int<lower=0,upper=1> Poisson;
}

transformed data {
  real<lower=0> bt_prior_sd;
  // Prior on the overall
  bt_prior_sd = prior_full_sd;
}

parameters {
  // Overall effect
  real bt;
}

model {
  // Prior on overall effect
  bt ~ normal(prior_mu, bt_prior_sd);
  // Model
  y ~ normal(bt, sigma);
}

generated quantities {
  // Poisson estimate transformed to give percent change
  vector[Poisson ? 1 : 0] bt_trans;
  if (Poisson==1) {
    bt_trans[1] = exp(bt)-1;
  }
}

martinmodrak · June 15, 2021, 4:47pm

Hi,
unfortunately, I don’t think you can do conditional generation without using an array. However, in your specific case, wouldn’t it be easier to have:

generated quantities {
  real bt_trans;
  if (Poisson==1) {
    // Poisson estimate transformed to give percent change
    bt_trans = exp(bt)-1;
  } else {
    bt_trans = bt;
  }
}

So in further processing you would always use bt_trans regardless of the type of the model.

I admit that I am not a big fan of transforming the overall effect this way as it makes it weirdly assymetric, but I guess there are scenarios where it makes perfect sense :-)

Best of luck with your model!

brijo · June 15, 2021, 8:52pm

Thank you @martinmodrak for your thoughtful comments. The appeal of having a generated quantity where both the name and the contents are conditional on the functional form of the first stage estimates is that members of our team go back to model fit objects much later for various purposes. My worry is that bt_trans named the same but sometimes constructed differently will be one more point of possible confusion in the future, especially to those who don’t work on the models. You are affirming that there isn’t an easier way to do this, which again is very, very helpful. Much obliged.

martinmodrak · June 16, 2021, 5:34am

In such a case, I think you would be better off with doing the transformation (and storing relevant metadata) in R/Python or whichever language where you do your data preparation. Stan is by design less flexible in types/names as it was not intended for extensive data mugging, while being a bit rigid prevents many errors in model code that would be difficult to catch otherwise.

Best of luck with your project!

brijo · June 16, 2021, 2:27pm

You’re probably right that this code belongs in R (we use rstan to interact with Stan). Thanks for another helpful suggestion.

Topic		Replies	Views
Newbie looking for help with creating ppc in generated quantities block with bernoulli_logit_glm_rng Modeling rstan	5	417	March 15, 2023
Generated quantities for prediction Modeling specification , posterior-predictive	9	543	June 5, 2024
Generating random numbers from a conditional distribution Modeling	10	574	December 15, 2020
Generated Quantities for prediction data General	6	1085	July 28, 2020
Adding unused variable changes model behaviour (Answered: no bug) Modeling specification	4	742	April 19, 2019

Conditional generated quantities in Stan aggregate model to handle different frequentist functional forms

Related topics