I’m trying to write a generative model for predicting how a student is going to perform on a standardized test. As a learning exercise, I want to start simple and slowly build complexity gradually. My first model is the following:
- Let y_{i,s,t} be the test score for student i in subject s and period t
- y_{i,s,t} \in [650,850]
- s \in \{ELA, Math\}
My idea is that in this model:
- \mu tells me the average performance on the test (regardless of the subject)
- \gamma tells me how persistence performance is (regardless of the subject)
This is my stan code [comments on the code are always welcome but this is not my question yet]:
data {
int<lower=1> I ; // number of students
int<lower=1> N ; // number of observations
int<lower=1,upper=I> ii[N]; // student id index
vector<lower=650,upper=850>[N] pre_test ; // pre-test
vector<lower=650,upper=850>[N] raw_score ; // raw_score
}
parameters {
real<lower=650, upper=850> mu;
vector[I] a_std;
real<lower=0> sigma_a;
real<lower=0> gamma;
real<lower=0> sigma;
}
transformed parameters {
vector[I] a = mu + sigma_a * a_std; // Matt trick
}
model {
mu ~ normal(700,75);
a_std ~ normal(0,1);
sigma_a ~ normal(0,100);
gamma ~ normal(0.5,1);
sigma ~ gamma(2,8);
raw_score ~ normal(a[ii] + gamma*pre_test, sigma);
}
generated quantities {
vector[N] score_sim;
vector[N] log_lik;
vector[N] theta;
for (n in 1:N) {
theta[n] = a[ii[n]] + gamma*pre_test[n] ;
score_sim[n] = normal_rng(theta[n], sigma);
log_lik[n] = normal_lcdf(raw_score[n] | theta[n], sigma);
}
}
The first thing that I want to do is to allow a and \gamma to be functions of the subject. My idea is that some students are going to be better at math and some others better at reading. Similarly, persistence in grades will be different for math and ELA. This is my attempt to write this slightly more complex model:
-
My first question is whether there is something wrong with the model as I wrote it (I’m still very new at writing this type of model…)
-
My second question is about implementing this in Stan, but I will write a follow-up comment once someone gives me some feedback on my first question.
Thanks a lot for all the help! This community is amazing!
Ignacio
PS: The more I watch @bgoodri youtube channel, the more I like the idea of thinking in terms of generative models. Thanks @bgoodri !