Simulating datasets given a model with brms

Hi, I’m completely new to bayesian modeling and stan and I’ve started to use brms. I’ve read about the possibility to simulate n datasets given a model and priors on model parameters. Moreover, I’ve seen that the function make_stancode() can report the actual stan code of the fitted model. My question is if I can use my stan code of the model (and the priors on it) to create some datasets. This could be an interesting workflow for bayesian dummies like me:

  • write a model with priors
  • generate datasets varying sample size and/or observations

I’ve seen the possibility to sample from the priors only using sample_priors = "only" however this gives me the final parameters varying my priors.

In my specific case, if could be useful to give me some suggestions, I’m fitting a multilevel logistic regression with the bernoulli family. I would like to generate some datasets with the 0-1 response variabile with different sample size and parameters on the intercept and random effect variance.

// generated with brms 2.13.0
functions {
data {
  int<lower=1> N;  // number of observations
  int Y[N];  // response variable
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  int prior_only;  // should the likelihood be ignored?
transformed data {
parameters {
  real Intercept;  // temporary intercept for centered predictors
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  vector[N_1] z_1[M_1];  // standardized group-level effects
transformed parameters {
  vector[N_1] r_1_1;  // actual group-level effects
  r_1_1 = (sd_1[1] * (z_1[1]));
model {
  // initialize linear predictor term
  vector[N] mu = Intercept + rep_vector(0, N);
  for (n in 1:N) {
    // add more terms to the linear predictor
    mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
  // priors including all constants
  target += student_t_lpdf(Intercept | 3, 0, 2.5);
  target += student_t_lpdf(sd_1 | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
  target += std_normal_lpdf(z_1[1]);
  // likelihood including all constants
  if (!prior_only) {
    target += bernoulli_logit_lpmf(Y | mu);
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept;


1 Like

Sorry, your question fell through. I think what you are after is exactly what is provided by the sample_prior = "only" once you have the fit with only prior, you can predict with posterior_predict to get actual datasets. Does that make sense? Or are you after something else?

Sorry for the late response but I miss the notification. Thanks for your answer. The idea should be the same but let’s say that I want to perform a power analysis or use different methods to the same datasets given the same generative model, could it be possibile using the sample_prior and posterior_predict approach? If I want to generate n observation or a different amount of variability in my data How can I do?

If I get it correctly - if you can use the newdata argument of posterior_predict to tweak some of the details of the datasets generated (each sample generated by posterior_predict is a single dataset), e.g. the total number of observations, the distrubition of predictor values. Each sample will already have somewhat different variability (because they will have a different value of sigma or sd_XX parameters for the varying effects, within what is allowed by your prior). What you can’t change is the prior distribution of model parameters or the number of predictors or the number of grouping levels - to change those you need to “refit” (the prior of) a new model.

Does that make sense?

totally yes! thanks. I was searching about doing this in pure STAN but is a little bit hard for me at the moment. The idea to specificy the generative model with brms (and also having the STAN code) is a game changer. I will try this approach!