Simulation to ensure model accuracy with covariate effects, how to ensure all parameters match true values?

halo2masterq · April 9, 2023, 1:18am

Currently I am trying to run a simulation to ensure my model is correct. When checking the estimates, I noticed that my “item_difficulty” matches the true values. However, this was achieved through the estimation with different “difficulty” and “beta_difficulty” values than the true ones. Even though i end up obtaining the same item_difficulty estimate, I feel like this indicates some type of problem. Especially if i wanted to see the covariate effect coefficients or difficulty estimates on their own, they would be different than what they should be.

Any potential reasons for this occurring?

data {
  int<lower=0> n_examinee; // Number of examinees
  int<lower=0> n_item; // Number of items

  int<lower=1> Z;   //number of obs
  int<lower=0> Y[Z]; //The data matrix (count) in vector
  
  int<lower=1> K;  // number of item-level covariates (e.g., item types)
  matrix[n_item, K] X; 
}
parameters {
  vector <lower = -6,upper = 6>[n_examinee] theta;  //
  vector <lower = -6, upper = 6> [n_item] difficulty;  //
  
  vector[K] beta_difficulty;

}

transformed parameters {
  
  vector[n_item] item_difficulty;
  for (i in 1:n_item) {
    item_difficulty[i] = difficulty[i] + dot_product(X[i], beta_difficulty);
  }
}

model {
  theta ~ normal(0, 3);
  beta_difficulty ~ normal(0,1);
  difficulty ~ normal(0,3);

  matrix [n_examinee, n_item] lambdas;
  for(i in 1:n_examinee){
    for (j in 1:n_item){
      lambdas[i,j] = exp(theta[i] + item_difficulty[j]);
    }
  }
  Y ~ poisson(to_vector(lambdas));
}

}

Bob_Carpenter · April 21, 2023, 6:53pm

I wasn’t sure what you meant by this:

Currently I am trying to run a simulation to ensure my model is correct. When checking the estimates, I noticed that my “item_difficulty” matches the true values. However, this was achieved through the estimation with different “difficulty” and “beta_difficulty” values than the true ones.

To do this test, you should simulate according to the way the model is written. So you simulate difficulty and theta and beta_difficulty from their priors, then simulate the data from the parameters. As long as you do that in a way that matches the way your model is defined, you should be able to recover your model parameters in their posterior intervals at nominal coverage.

Some more general Stan/stats comments:

You can rewrite the transformed parameters to be much more efficiently and compactly as
```
vector[n_item] item_difficulty = difficulty + X * beta_difficulty;
```
The interval constraints can be challenging. Is there a reason you bounded the items? The problem is that exactly the place where the bounds matter is where they cause problems by bunching up probability mass near the boundaries.
I would strongly recommend not taking in Z as an argument because it has to be equal to n_examinee * n_item. Instead, just use n_examinee * n_item and then the data can never be inconsistent. For the same reason, I’d be inclined to take in Y as a n_examinee * n_item 2D array and then do the conversion to a 1D array in the Stan program so that it’s easier to use from outside the Stan program.
Stan has a built-in poisson_log function that takes the argument on a log scale and is much more arithmetically stable than exponentiating. The arithmetic defining the matrix lambdas can also be vectorized. Together, the last few lines of code are
```
for (i in 1:n_examinee) log_lambdas[i] =theta';  // even better, define `theta` as a row vector
for (j in 1:n_item) log_lambdas[ , j] += item_difficulty;
Y ~ poisson_log(to_vector(log_lambdas));
```

Topic		Replies	Views
Estimation accuracy is high but predictive accuracy is low? Possibly overfitting? Modeling rstan , techniques , fitting-issues , specification , loo	2	314	August 29, 2023
My model only seems to work well with large sample size, am I making a simple mistake? Modeling rstan , techniques , fitting-issues , specification	1	243	June 29, 2023
Bias in some parameters but not all from simulated data Modeling fitting-issues	3	554	November 19, 2021
Poisson count model indeterminacy problems Modeling rstan , techniques , fitting-issues , specification	6	397	June 10, 2023
Known beta parameters in IRT model Modeling	4	993	June 5, 2018

Simulation to ensure model accuracy with covariate effects, how to ensure all parameters match true values?

Related topics