# Simulation to ensure model accuracy with covariate effects, how to ensure all parameters match true values?

Currently I am trying to run a simulation to ensure my model is correct. When checking the estimates, I noticed that my “item_difficulty” matches the true values. However, this was achieved through the estimation with different “difficulty” and “beta_difficulty” values than the true ones. Even though i end up obtaining the same item_difficulty estimate, I feel like this indicates some type of problem. Especially if i wanted to see the covariate effect coefficients or difficulty estimates on their own, they would be different than what they should be.

Any potential reasons for this occurring?

``````data {
int<lower=0> n_examinee; // Number of examinees
int<lower=0> n_item; // Number of items

int<lower=1> Z;   //number of obs
int<lower=0> Y[Z]; //The data matrix (count) in vector

int<lower=1> K;  // number of item-level covariates (e.g., item types)
matrix[n_item, K] X;
}
parameters {
vector <lower = -6,upper = 6>[n_examinee] theta;  //
vector <lower = -6, upper = 6> [n_item] difficulty;  //

vector[K] beta_difficulty;

}

transformed parameters {

vector[n_item] item_difficulty;
for (i in 1:n_item) {
item_difficulty[i] = difficulty[i] + dot_product(X[i], beta_difficulty);
}
}

model {
theta ~ normal(0, 3);
beta_difficulty ~ normal(0,1);
difficulty ~ normal(0,3);

matrix [n_examinee, n_item] lambdas;
for(i in 1:n_examinee){
for (j in 1:n_item){
lambdas[i,j] = exp(theta[i] + item_difficulty[j]);
}
}
Y ~ poisson(to_vector(lambdas));
}

}
``````

I wasn’t sure what you meant by this:

Currently I am trying to run a simulation to ensure my model is correct. When checking the estimates, I noticed that my “item_difficulty” matches the true values. However, this was achieved through the estimation with different “difficulty” and “beta_difficulty” values than the true ones.

To do this test, you should simulate according to the way the model is written. So you simulate difficulty and theta and beta_difficulty from their priors, then simulate the data from the parameters. As long as you do that in a way that matches the way your model is defined, you should be able to recover your model parameters in their posterior intervals at nominal coverage.

1. You can rewrite the transformed parameters to be much more efficiently and compactly as

``````vector[n_item] item_difficulty = difficulty + X * beta_difficulty;
``````
2. The interval constraints can be challenging. Is there a reason you bounded the items? The problem is that exactly the place where the bounds matter is where they cause problems by bunching up probability mass near the boundaries.

3. I would strongly recommend not taking in `Z` as an argument because it has to be equal to `n_examinee * n_item`. Instead, just use `n_examinee * n_item` and then the data can never be inconsistent. For the same reason, I’d be inclined to take in `Y` as a `n_examinee * n_item` 2D array and then do the conversion to a 1D array in the Stan program so that it’s easier to use from outside the Stan program.

4. Stan has a built-in `poisson_log` function that takes the argument on a log scale and is much more arithmetically stable than exponentiating. The arithmetic defining the matrix `lambdas` can also be vectorized. Together, the last few lines of code are

``````for (i in 1:n_examinee) log_lambdas[i] =theta';  // even better, define `theta` as a row vector
for (j in 1:n_item) log_lambdas[ , j] += item_difficulty;
Y ~ poisson_log(to_vector(log_lambdas));
``````