Fake data with interaction

Hi,

I would like to generate a fake response using rstan from a model with interaction of Continuous:Categorical variable.

I have made the following model with two variables,

  • continuous with two levels and

  • categorical of four levels.

    sim_data <- 
     '
    data {
    int N;
    real alpha;
    real Category[N];       // Coefficient for each level
    real Category_sigma[N]; // Uncertainty for the coefficient 
    real x1_b;       //Coefficient for the continues 
    real x1_b_sigma; // Uncertainty for the coefficient
    real x1_Category[N];       // Interaction coefficients for each level
    real x1_Category_sigma[N]; // Uncertainty for the coefficients
    real Category_grid[N];
    real<lower=0> sigma;
    
    real x1_grid[N];
    }
    
    parameters {}
    model {}
    
    generated quantities {
    real y[N];
    real x1_beta;
    real Category_betas[N];
    real x1_Cat_beta[N];
    
    x1_beta = normal_rng(x1_b, x1_b_sigma);
    Category_betas = normal_rng(Category, Category_sigma);
    x1_Cat_beta =  normal_rng(x1_Category, x1_Category_sigma);
    
    for (n in 1:N) {
    y[n] = normal_rng(
      alpha + Category_betas[n] +  // Categorical 
      x1_beta * x1_grid[n] +       // Continues
      x1_Cat_beta[n] * x1_grid[n] * Category_grid[n]  // Interaction
      ,sigma);
      }
    }
    '
    
    compiled_example <- stan_model(model_code = sim_data)
    

and the fake input is

Category <- c('A', 'B','C', 'D')
x1_grid <- seq(0, 1, 1)


data_grid <- expand.grid(Category = Category, 
                        x1_grid  = x1_grid)

Then I sample by setting my own coefficients using the following,

N = nrow(data_grid)
sim_ex <- sampling(
  compiled_example, algorithm = 'Fixed_param',
  data = list(  N = N, 
                alpha = 0,
                Category = rep(c(49.4, 50, 50, 52.9), length.out = N),  # Coefficient for each level
                Category_sigma = rep(c(2, 1.8, 1.7, 2), length.out = N),#Uncertainty for the coefficients
                x1_b = -17,       # Coefficient for the continues 
                x1_b_sigma = 2.3, # Uncertainty for the coefficient
                x1_Category = rep(c(2, -1.5, .3, -.9), length.out = N),         # Interaction Coefficients 
                x1_Category_sigma = rep(c(.4, .4, .4, .4), length.out = N), #Uncertainty for the coefficients
                
                x1_grid = data_grid$x1_grid,
                Category_grid = rep(c(1, 2, 3, 4), length.out = N),

                sigma = 1
                
  ), refresh = 0)

Now, for the simpler case without interaction seems to work. But there is a mistake in the interaction and I don’t know how to solve it.

Also if you have links of examples in simulating fake data with Stan would be great to share them because I have found only a couple (1)(2)

Thanks!

Are you getting an error at runtime or is it that the results do not look correct?

Are you getting an error at runtime or is it that the results do not look correct?

Hi,
It runs perfectly fine. My problem is that I don’t know how to model the interaction.

I see in your code you have an interaction term:

y[n] = normal_rng(
  alpha + Category_betas[n] +  // Categorical 
  x1_beta * x1_grid[n] +       // Continues
  x1_Cat_beta[n] * x1_grid[n] * Category_grid[n]  // Interaction
  ,sigma);

Do you mean that you’re having a problem fitting the data once you generate it?