# Hierarchical IRT model with reduce_sum

Hi all,

I’m playing around with the `reduce_sum` function to try to take advantage of the within-chain parallelization for our hierarchical IRT model. Our ultimate goal is to get the model up and running with an MPI, but I figured that this would be a good place to start. Here’s the model:

``````data{
int N;
int I;
int J;
int K;
int T;
int<lower=0,upper=1> judg[N];
int asp[N];
int asp_par[J];
int country[N];
int year[N];
int comp[N];
}

parameters{
real theta[I,J,T];
matrix<lower=0>[J,K] beta;
matrix[J,K] alpha;
vector[K] sigma_alpha;
vector[K] sigma_beta;
}

model{
// Likelihood
for(n in 1:N){
judg[n] ~ bernoulli_logit(beta[asp[n], comp[n]] *
(theta[country[n], asp[n], year[n]] - alpha[asp[n], comp[n]]));
}

// Priors
// theta
// A draw for  the first year for each aspect
for(j in 1:J){
theta[,j,1] ~ std_normal();
}

// Loop over aspects
for(j in 1:J){
// This year's ideal point is drawn from a distribution centered on last year's
for(t in 2:T){
theta[,j,t] ~ normal(theta[,j,t-1], 1);
}
}
// beta
// Loop over composers
for(k in 1:K){
beta[1,k] ~ normal(0, sigma_beta[k]); // Draw for first node

for(j in 2:J){
beta[j,k] ~ normal(beta[asp_par[j],k], sigma_beta[k]);
// Draw from dist. centered on parent
}
}

// alpha
for(k in 1:K){
alpha[1,k] ~ normal(0, sigma_alpha[k]); // Draw for first node

for(j in 2:J){
alpha[j,k] ~ normal(alpha[asp_par[j],k], sigma_alpha[k]);
// Draw from dist. centered on parent
}
}

sigma_alpha ~ normal(0,5);
sigma_beta ~ normal(0,5);
}
}
``````

This model runs well, but we’re trying to speed it up. I tried to use `reduce_sum` to do so. Here are the functions and model blocks from my (probably poor) attempt after following the red card example:

``````functions{
real partial_sum(int[] slice_judg,
int start, int end,
int[] asp,
int[] asp_par,
int[] country,
int[] year,
int[] comp,
real[,,] theta,
real[,] beta,
real[,] alpha
) {
return binomial_logit_lpmf(slice_judg | beta[asp[start:end], comp[start:end]] *
(theta[country[start:end], asp[start:end], year[start:end]] -
alpha[asp[start:end], comp[start:end]]));
}
}
``````

``````model{
target += reduce_sum(partial_sum, judg, grainsize,
asp, asp_par, country, year, comp,
theta, beta, alpha);
``````

The model is failing to compile, and I’m fairly certain that the most pressing issue is in the functions block where I’m declaring the parameters that go in the `bernoulli_logit_lpmf`. Can anyone point me in the right direction? Let me know if you need any other info from me!

As an update to this, I got the `reduce_sum` model working, but the speedup isn’t as much as I was hoping. When running a single chain in parallel using 4 cores, I don’t get much benefit in the way of speed as compared to the regular model with no parallelization. Here is the model with `reduce_sum`:

``````functions{
real partial_sum(int[] slice_judg,
int start, int end,
int[] asp,
int[] asp_par,
int[] country,
int[] year,
int[] comp,
real[,,] theta,
matrix beta,
matrix alpha,
int N
) {
vector[N] eta;
for(n in 1:N){
eta[n] = beta[asp[n], comp[n]] * (theta[country[n], asp[n], year[n]] -
alpha[asp[n], comp[n]]);
}

return bernoulli_logit_lpmf(slice_judg | eta[start:end]);
}
}

data{
int N; // Total number of sentences
int I; // Total number of countries
int J; // Total number of aspects
int K; // Total number of composers
int t; // Total number of years
int<lower=0,upper=1> judg[N]; // Binary judgments
int asp[N]; // Array of aspect indicators for each judgment
int asp_par[J]; // Array of aspect-parent indicators
int country[N]; // Array of country indicators for each judgment
int year[N]; // Array of year indicators for each judgment
int comp[N]; // Array of composer indicators for each judgment
int<lower=1> grainsize; // Grain size for reduce_sum
}

parameters{
real theta[I,J,t]; // Array of ideal points by aspect and year
matrix<lower=0>[J,K] beta; // Matrix of discriminations by composers
matrix[J,K] alpha; // Matrix of aspect difficulty by composers
vector<lower=0>[K] sigma_alpha; // SD for alpha by composer
vector<lower=0>[K] sigma_beta; // SD for beta by composer
}

model{
// Rewrite the likelihood
target += reduce_sum(partial_sum, judg, grainsize,
asp, asp_par, country, year, comp,
theta, beta, alpha, N);

// Priors
// theta
// A draw for  the first year for each aspect
for(j in 1:J){
theta[,j,1] ~ std_normal();
}

// Loop over aspects
for(j in 1:J){
// This year's ideal point is drawn from a distribution centered on last year's
for(tt in 2:t){
theta[,j,tt] ~ normal(theta[,j,tt-1], 1);
}
}
// beta
// Loop over composers
for(k in 1:K){
beta[1,k] ~ normal(0, sigma_beta[k]); // Draw for first node

for(j in 2:J){
beta[j,k] ~ normal(beta[asp_par[j],k], sigma_beta[k]);
// Draw from dist. centered on parent
}
}

// alpha
for(k in 1:K){
alpha[1,k] ~ normal(0, sigma_alpha[k]); // Draw for first node

for(j in 2:J){
alpha[j,k] ~ normal(alpha[asp_par[j],k], sigma_alpha[k]);
// Draw from dist. centered on parent
}
}

sigma_alpha ~ normal(0,5);
sigma_beta ~ normal(0,5);
}
``````

Does anyone have any suggestions as to how I might speed this up a little more? Is there anything in the `functions` block that could be changed?

Well, for one thing I recommend you only compute the `eta`s you actually need.

``````vector[end - start + 1] eta;
for(n in start:end){
eta[n - start + 1] =
beta[asp[n], comp[n]] * (theta[country[n], asp[n], year[n]]
- alpha[asp[n], comp[n]]);
}
return bernoulli_logit_lpmf(slice_judg | eta);
``````

You really should first vectorize the priors and the hierarchical priors! Those loops look suspicious to me.

You can use to vector applied to a matrix which often lets you apply densities to large chunks of parameters.