Simulate design matrix using Stan


Dear Bob,

I am migrating our recent discussion from Google groups. Thanks for pointing me here.
Indeed, I am trying to simulate the design matrix using a distribution. Currently, I am using this R code:

## Generate response
response <- rep(x = c(1, 0), times = 25)

## Generate covariates conditioned on response
generate_covariates <- function(response_variable){
  ## covariate_i: Bernoulli random variable drawn with probability ranging from 0 to 1
  prob_given_1 <- runif(n = 1, min = 0, max = 1)
  prob_given_0 <- runif(n = 1, min = 0, max = 1)
  ## generate class-conditional distribution
  covar_given_1 <- rbinom(n = length(which(response == 1)), size = 1, prob = prob_given_1)
  covar_given_0 <- rbinom(n = length(which(response == 0)), size = 1, prob = 1 - prob_given_0)
  return(c(covar_given_1, covar_given_0))

covariate_list <- sapply(c(1:10), function(x){
  covars <- generate_covariates(response_variable = response)
  }, simplify = TRUE)

## Put together simulated response and covariates
simulated_data <- data.frame(response, covariate_list, stringsAsFactors = FALSE)

Once the simulated_data is generated, I run a logistic regression on it using Stan.

My question is can I do the simulation process above in Stan itself?

Thanks for any insights!


On May 8, 2017, at 6:57 PM, Bob Carpenter wrote:

You are never going to be able to simulate the design matrix
without giving it a distribution—if it’s just covariates,
there’s nothing to simulate from. Usually the covariates don’t
get modeled because everything’s conditionally independent given
their value.

Which interface are you using?

  • Bob

P.S. We’re shutting down this list soon. We’ve switched to:

On May 7, 2017, at 8:48 PM, Jy wrote:

Dear Bob,

Is this possible now in Stan - i.e. a way to simulate the design matrix that will then be provided > as input to stan? If yes, would you please point me to documentation of how to do this?


On Saturday, 17 October 2015 17:46:12 UTC-4, Bob Carpenter wrote:
What I’m talking about here is a way to generate a single fake data
set to use as input to Stan to make sure my program is doing the
right thing. Right now, there’s no easy way (that I know of) to convert
Stan output (in any interface) into something that can be fed back in directly
as data.

  • Bob


Yes you can write your program with a data block where you pass in the simulation parameters and then generate the matrices in the transformed data block, then you fit your model to that data.