Help learning how to write a simple generative hierarchical model for test-scores


#1

I’m trying to write a generative model for predicting how a student is going to perform on a standardized test. As a learning exercise, I want to start simple and slowly build complexity gradually. My first model is the following:

  • Let y_{i,s,t} be the test score for student i in subject s and period t
  • y_{i,s,t} \in [650,850]
  • s \in \{ELA, Math\}
\begin{aligned} y_{i,s,t} \sim N(a[i] + \gamma y_{i,s,t-1} , \sigma) \\ a[i] \sim N(\mu,sigma_a) \\ \mu \sim N_{[650, 850]}(700, 75) \\ \sigma_a \sim N_+(0, 100) \\ \gamma \sim N_+(0.5,1) \\ \sigma \sim \Gamma(2,8) \end{aligned}

My idea is that in this model:

  • \mu tells me the average performance on the test (regardless of the subject)
  • \gamma tells me how persistence performance is (regardless of the subject)

This is my stan code [comments on the code are always welcome but this is not my question yet]:


data {
  int<lower=1> I ;                                                  // number of students
  int<lower=1> N ;                                                  // number of observations
  int<lower=1,upper=I> ii[N];                                       // student id index
  vector<lower=650,upper=850>[N] pre_test ;                         // pre-test
  vector<lower=650,upper=850>[N] raw_score ;                        // raw_score
}


parameters {
  real<lower=650, upper=850> mu;
  vector[I] a_std;              
  real<lower=0> sigma_a;
  real<lower=0> gamma;
  real<lower=0> sigma;
}

transformed parameters {
  vector[I] a =  mu + sigma_a * a_std;            // Matt trick
}

model {
  mu ~ normal(700,75);
  a_std ~ normal(0,1);
  sigma_a ~ normal(0,100);
  gamma ~ normal(0.5,1);
  sigma ~ gamma(2,8);
  raw_score ~ normal(a[ii] + gamma*pre_test, sigma);
}

generated quantities {
  vector[N] score_sim;
  vector[N] log_lik;
  vector[N] theta;
  for (n in 1:N)  {
    theta[n] = a[ii[n]] + gamma*pre_test[n] ;
    score_sim[n] = normal_rng(theta[n], sigma);
    log_lik[n] = normal_lcdf(raw_score[n] | theta[n], sigma);
  }
}

The first thing that I want to do is to allow a and \gamma to be functions of the subject. My idea is that some students are going to be better at math and some others better at reading. Similarly, persistence in grades will be different for math and ELA. This is my attempt to write this slightly more complex model:

\begin{aligned} y_{i,s,t} \sim N(a[i,s] + \gamma y_{i,s,t-1} , \sigma) \\ a[i,s] = \mu + b[i,s] \\ \mu \sim N_{[650, 850]}(700, 75) \\ b[i,s] \sim N(0, \Sigma) \\ \Sigma[1,1] = \sigma_{math} \\ \Sigma[2,2] = \sigma_{ELA} \\ \Sigma[1,2] = \Sigma[1,2] = \rho \sigma_{math} \sigma_{ELA} \\ \gamma = \gamma_0 + \gamma_{math} I[s==math] \\ \gamma_0 \sim N_+(0.5,1) \\ \gamma_{math} \sim N(0,1) \\ \sigma \sim \Gamma(2,8) \end{aligned}
  • My first question is whether there is something wrong with the model as I wrote it (I’m still very new at writing this type of model…)

  • My second question is about implementing this in Stan, but I will write a follow-up comment once someone gives me some feedback on my first question.

Thanks a lot for all the help! This community is amazing!

Ignacio

PS: The more I watch @bgoodri youtube channel, the more I like the idea of thinking in terms of generative models. Thanks @bgoodri !


#2

The question’s clear, but these are hard to answer as it’s asking someone to try to look at your model and debug it. It’s hard to verify something’s right without a lot of work.

Before trying to model covariance, I’d try to fit b with independent priors.

Did you model \gamma as a baseline plus math difference for convenience of priors? If not, you might try modeling independently dpeending on how the \gamma_0 vs. \gamma_{\mathrm{math}} looks in the posterior.

You might get better computation trying to pull the scale down so that \mu was modeled on a standardized scale and then multiplied out to this standardized testing range.

It’ll probably be OK modeling a persistence difference for math, but why not model the two persistences independently? Sometimes the offset will fit better, but it’s also a matter of being able to formulate reasonable priors.

There’s a lot of discussion of these kinds of models in Gelman and Hill’s regression book. That’s being rewritten in Stan, but isn’t quite ready yet.


#3

First, thanks a lot for all the help you are giving me in my “Bayesian learning journey.” I understand that what i’m asking is hard, so I appreciate any amount of help that this amazing community might be willing to provide :-)

This is my best attempt as of right now at doing that:

\begin{aligned} y_{i,s,t} \sim N(a[s=math,i] I(s=math) + a[s=ELA,i] I(s=ELA) + \gamma y_{i,s,t-1} , \sigma) \\ a[s=math,i] \sim N(\mu_{math},sigma_{aMath}) \\ a[s=ELA,i] \sim N(\mu_{ELA},sigma_{aELA}) \\ \mu_{math} \sim N(650, 75) \\ \mu_{ELA} \sim N(700, 75) \\ \sigma_{aMath} \sim N_+(0, 100) \\ \sigma_{aELA} \sim N_+(0, 75) \\ \gamma \sim N_+(0.5,1) \\ \sigma \sim \Gamma(2,8) \end{aligned}

What I tried to do is to say that each student i has a random effect for math and for ela and those effects are independent (which is the assumption that I want to change in my next model).

This is how I implemented the model in Stan:

data {
  int<lower=1> I ;                                                  // number of students
  int<lower=1> N ;                                                  // number of observations
  int<lower=1,upper=I> ii[N];                                       // student id index
  vector<lower=0,upper=1>[N] math;                                  // math dummies
  vector<lower=650,upper=850>[N] pre_test ;                         // pre-test
  vector<lower=650,upper=850>[N] raw_score ;                        // raw_score
}

transformed data {
  vector<lower=0,upper=1>[N] ELA = 1 - math ;                       // ELA dummies
}


parameters {
  vector[2] mu;                                                     // mu[1] is math and [mu2] is ELA
  vector[I] a_math_std;   
  vector<lower=0>[2] sigma_a;
  vector[I] a_ELA_std;  
  real<lower=0> gamma;
  real<lower=0> sigma;
}

transformed parameters {
  vector[I] a_math =  mu[1] + sigma_a[1] * a_math_std;            // Matt trick
  vector[I] a_ELA =  mu[2] + sigma_a[2] * a_ELA_std;              // Matt trick
}

model {
  mu[1] ~ normal(650,75);
  a_math_std ~ normal(0,1);
  sigma_a[1] ~ normal(0,100);
  a_ELA_std ~ normal(0,1);
  sigma_a[2] ~ normal(0,100);
  gamma ~ normal(0.5,1);
  sigma ~ gamma(2,8);
  raw_score ~ normal(a_math[ii] .* math + a_ELA[ii] .* ELA  + gamma*pre_test, sigma);
}

generated quantities {
  vector[N] score_sim;
  vector[N] log_lik;
  vector[N] theta;
  for (n in 1:N)  {
    theta[n] = a_math[ii[n]] * math[n]  + a_ELA[ii[n]] * ELA[n]  + gamma*pre_test[n] ;
    score_sim[n] = normal_rng(theta[n], sigma);
    log_lik[n] = normal_lcdf(raw_score[n] | theta[n], sigma);
  }
}

One thing that I’ve been trying to do is declaring a single parameter vector[2] a_std[I]; instead of one for math and one for ELA. Alas, when I do that I get the following error:

parameters {
  vector[2] mu;                                                     // mu[1] is math and [mu2] is ELA
  vector[2] a_std[I];                                               // this does not work!
  vector<lower=0>[2] sigma_a;
  real<lower=0> gamma;
  real<lower=0> sigma;
}

transformed parameters {
  vector[I] a_math =  mu[1] + sigma_a[1] * a_std[1];             // Matt trick
  vector[I] a_ELA =  mu[2] + sigma_a[2] * a_std[2];              // Matt trick
} 

[1] "Error in sampler$call_sampler(args_list[[i]]) : "                                                                                                      
[2] "  Exception: assign: Rows of left-hand-side (191) and rows of right-hand-side (2) must match in size  (in 'modelae820f4f6_stan_ae6476be8d' at line 24)"
error occurred during calling the sampler; sampling not done

My guess is that my understanding of how to use arrays in Stan is the problem :-( Alas, I think I need to be able to define vector[2] a[I] ; in order to use multi_normal to set the prior for a.

Yes, I thought defining it in this way would be more convenient to define the priors. Every single time I wrote a model until now I’ve been using weakly informative priors. One of the reasons I’m doing this as a learning exercise is to try to use more informative priors for the first time. That said, I will try your suggestion and compare it with what I originally had in mind.

I want to try this too and compaire the two models as a learning exercise.

I look forward to buying it!


#4

What’s line 24 of your program? You have mismatched size assignments.


#5

this is line 24:

vector[I] a_math =  mu[1] + sigma_a[1] * a_std[1];             // Matt trick

and this is how all those parameters are defined:

parameters {
  vector[2] mu;                                                     // mu[1] is math and [mu2] is ELA
  vector[2] a_std[I];                                               // this does not work!
  vector<lower=0>[2] sigma_a;
  real<lower=0> gamma;
  real<lower=0> sigma;
}

My guess is that I’m misunderstanding how to access the array a_std.

Thanks for the help!


#6

When you have

vector[K] foo[N];

you are declaring an N-dimensional array of K-vectors.

In your case, a_std[1] is a vector of size 2 hereas a_math is a vector of size I. What are you tring to do? Your a_std is an I x 2 structure. Maybe you wanted

vector[I] a_std[2];

#7

Thanks a lot bob, this actually works. In case this is helpful for someone in the future, this is my code right now:

data {
  int<lower=1> I ;                                                  // number of students
  int<lower=1> N ;                                                  // number of observations
  int<lower=1,upper=I> ii[N];                                       // student id index
  vector<lower=0,upper=1>[N] math;                                  // math dummies
  vector<lower=650,upper=850>[N] pre_test ;                         // pre-test
  vector<lower=650,upper=850>[N] raw_score ;                        // raw_score
}

transformed data {
  vector<lower=0,upper=1>[N] ELA = 1 - math ;                       // ELA dummies
}


parameters {
  vector[2] mu;                                                     // mu[1] is math and [mu2] is ELA
  vector[I] a_std[2];                                              
  vector<lower=0>[2] sigma_a;
  real<lower=0> gamma;
  real<lower=0> sigma;
}

transformed parameters {
  vector[I] a_math =  mu[1] + sigma_a[1] * a_std[1];            // Matt trick
  vector[I] a_ELA =  mu[2] + sigma_a[2] * a_std[2];              // Matt trick
}

model {
  mu[1] ~ normal(650,75);
  a_std[1] ~ normal(0,1);
  a_std[2] ~ normal(0,1);
  sigma_a[1] ~ normal(0,100);
  sigma_a[2] ~ normal(0,100);
  gamma ~ normal(0.5,1);
  sigma ~ gamma(2,8);
  raw_score ~ normal(a_math[ii] .* math + a_ELA[ii] .* ELA  + gamma*pre_test, sigma);
}

generated quantities {
  vector[N] score_sim;
  vector[N] log_lik;
  vector[N] theta;
  for (n in 1:N)  {
    theta[n] = a_math[ii[n]] * math[n]  + a_ELA[ii[n]] * ELA[n]  + gamma*pre_test[n] ;
    score_sim[n] = normal_rng(theta[n], sigma);
    log_lik[n] = normal_lcdf(raw_score[n] | theta[n], sigma);
  }
}

My next goal is to write a_math and a_ELA as coming from multi_normal instead of two independent normals. My idea is that random effects for math and english should be correlated. To do this I’m trying to follow the manual, which brings my next question. The example on the manual has a “group predictors” variable row_vector[L] u[J]; (page 149). I’m not sure what this should be in my example. This is how I tried to translate my code following that example:

data {
  int<lower=1> I ;                                                  // number of students
  int<lower=1> N ;                                                  // number of observations
  int<lower=1,upper=2> jj[N];                                       // subject for student
  vector<lower=650,upper=850>[N] pre_test ;                         // pre-test
  vector<lower=650,upper=850>[N] raw_score ;                        // raw_score
}


parameters {
  corr_matrix[2] Omega ;                                            // prior correlation
  vector<lower=0>[2] tau ;                                          // prior scale
  matrix[I,2] eta ;                                                 // subject coeff
  vector[2] a[I] ;                                                  // individual coeff by subject
  real<lower=0> gamma;
  real<lower=0> sigma;
}


model {
  tau ~ cauchy(0,2.5) ;
  Omega ~ lkj_corr(2) ;
  to_vector(eta) ~ normal(0,5) ;
  {
  row_vector[2] u_eta[I];
  for (i in 1:I)
    u_eta[i] = u[i] * eta;                                          // WHAT should u[i] be here??? In page 149 is "group predictors" in the data block
  a ~ multi_normal(u_eta, quad_form_diag(Omega, tau)) ;
  }

  gamma ~ normal(0.5,1);
  sigma ~ gamma(2,8);
  for(n in 1:N)
  raw_score[n] ~ normal(a[jj[n]]  + gamma*pre_test[n], sigma);
}

generated quantities {
  vector[N] score_sim;
  vector[N] log_lik;
  vector[N] theta;
  for (n in 1:N)  {
    theta[n] = a[jj[n]] + gamma*pre_test[n] ;
    score_sim[n] = normal_rng(theta[n], sigma);
    log_lik[n] = normal_lcdf(raw_score[n] | theta[n], sigma);
  }
}

Thanks!