Hierarchical IRT model with reduce_sum

Hi all,

I’m playing around with the reduce_sum function to try to take advantage of the within-chain parallelization for our hierarchical IRT model. Our ultimate goal is to get the model up and running with an MPI, but I figured that this would be a good place to start. Here’s the model:

data{
    int N; 
    int I; 
    int J; 
    int K; 
    int T; 
    int<lower=0,upper=1> judg[N];
    int asp[N]; 
    int asp_par[J];
    int country[N];
    int year[N]; 
    int comp[N]; 
}

parameters{
    real theta[I,J,T];
    matrix<lower=0>[J,K] beta;
    matrix[J,K] alpha; 
    vector[K] sigma_alpha; 
    vector[K] sigma_beta;
}

model{      
    // Likelihood
    for(n in 1:N){
        judg[n] ~ bernoulli_logit(beta[asp[n], comp[n]] * 
                           (theta[country[n], asp[n], year[n]] - alpha[asp[n], comp[n]]));
    }
      
    // Priors
    // theta
    // A draw for  the first year for each aspect
    for(j in 1:J){
        theta[,j,1] ~ std_normal();
    }
      
    // Loop over aspects
    for(j in 1:J){
        // This year's ideal point is drawn from a distribution centered on last year's
        for(t in 2:T){   
            theta[,j,t] ~ normal(theta[,j,t-1], 1);   
        }   
    }
    // beta
    // Loop over composers
    for(k in 1:K){
        beta[1,k] ~ normal(0, sigma_beta[k]); // Draw for first node
        
        for(j in 2:J){
            beta[j,k] ~ normal(beta[asp_par[j],k], sigma_beta[k]); 
            // Draw from dist. centered on parent
        }    
    }
    
    // alpha
    for(k in 1:K){
        alpha[1,k] ~ normal(0, sigma_alpha[k]); // Draw for first node
        
        for(j in 2:J){
            alpha[j,k] ~ normal(alpha[asp_par[j],k], sigma_alpha[k]); 
            // Draw from dist. centered on parent
        }    
    }
    
    sigma_alpha ~ normal(0,5);
    sigma_beta ~ normal(0,5);
}
} 

This model runs well, but we’re trying to speed it up. I tried to use reduce_sum to do so. Here are the functions and model blocks from my (probably poor) attempt after following the red card example:

functions{
  real partial_sum(int[] slice_judg,
                   int start, int end,
                   int[] asp,
                   int[] asp_par,
                   int[] country,
                   int[] year,
                   int[] comp,
                   real[,,] theta,
                   real[,] beta,
                   real[,] alpha
                   ) {
    return binomial_logit_lpmf(slice_judg | beta[asp[start:end], comp[start:end]] * 
                      (theta[country[start:end], asp[start:end], year[start:end]] -
                      alpha[asp[start:end], comp[start:end]]));
  }
}

model{
    target += reduce_sum(partial_sum, judg, grainsize,
                       asp, asp_par, country, year, comp,
                       theta, beta, alpha);

The model is failing to compile, and I’m fairly certain that the most pressing issue is in the functions block where I’m declaring the parameters that go in the bernoulli_logit_lpmf. Can anyone point me in the right direction? Let me know if you need any other info from me!

As an update to this, I got the reduce_sum model working, but the speedup isn’t as much as I was hoping. When running a single chain in parallel using 4 cores, I don’t get much benefit in the way of speed as compared to the regular model with no parallelization. Here is the model with reduce_sum:

functions{
  real partial_sum(int[] slice_judg,
                   int start, int end,
                   int[] asp,
                   int[] asp_par,
                   int[] country,
                   int[] year,
                   int[] comp,
                   real[,,] theta,
                   matrix beta,
                   matrix alpha,
                   int N
                   ) {
    vector[N] eta; 
    for(n in 1:N){
        eta[n] = beta[asp[n], comp[n]] * (theta[country[n], asp[n], year[n]] - 
        alpha[asp[n], comp[n]]);
    }
    
    return bernoulli_logit_lpmf(slice_judg | eta[start:end]);
  }
}

data{
    int N; // Total number of sentences
    int I; // Total number of countries
    int J; // Total number of aspects
    int K; // Total number of composers
    int t; // Total number of years
    int<lower=0,upper=1> judg[N]; // Binary judgments
    int asp[N]; // Array of aspect indicators for each judgment
    int asp_par[J]; // Array of aspect-parent indicators
    int country[N]; // Array of country indicators for each judgment
    int year[N]; // Array of year indicators for each judgment
    int comp[N]; // Array of composer indicators for each judgment
    int<lower=1> grainsize; // Grain size for reduce_sum
}

parameters{
    real theta[I,J,t]; // Array of ideal points by aspect and year
    matrix<lower=0>[J,K] beta; // Matrix of discriminations by composers
    matrix[J,K] alpha; // Matrix of aspect difficulty by composers
    vector<lower=0>[K] sigma_alpha; // SD for alpha by composer
    vector<lower=0>[K] sigma_beta; // SD for beta by composer
}

model{
    // Rewrite the likelihood
    target += reduce_sum(partial_sum, judg, grainsize,
                       asp, asp_par, country, year, comp,
                       theta, beta, alpha, N);
      
    // Priors
    // theta
    // A draw for  the first year for each aspect
    for(j in 1:J){
        theta[,j,1] ~ std_normal();
    }
      
    // Loop over aspects
    for(j in 1:J){
        // This year's ideal point is drawn from a distribution centered on last year's
        for(tt in 2:t){   
            theta[,j,tt] ~ normal(theta[,j,tt-1], 1);   
        }   
    }
    // beta
    // Loop over composers
    for(k in 1:K){
        beta[1,k] ~ normal(0, sigma_beta[k]); // Draw for first node
        
        for(j in 2:J){
            beta[j,k] ~ normal(beta[asp_par[j],k], sigma_beta[k]); 
            // Draw from dist. centered on parent
        }    
    }
    
    // alpha
    for(k in 1:K){
        alpha[1,k] ~ normal(0, sigma_alpha[k]); // Draw for first node
        
        for(j in 2:J){
            alpha[j,k] ~ normal(alpha[asp_par[j],k], sigma_alpha[k]); 
            // Draw from dist. centered on parent
        }    
    }
    
    sigma_alpha ~ normal(0,5);
    sigma_beta ~ normal(0,5);
}

Does anyone have any suggestions as to how I might speed this up a little more? Is there anything in the functions block that could be changed?

Well, for one thing I recommend you only compute the etas you actually need.

vector[end - start + 1] eta; 
for(n in start:end){
    eta[n - start + 1] =
          beta[asp[n], comp[n]] * (theta[country[n], asp[n], year[n]]
                                   - alpha[asp[n], comp[n]]);
}
return bernoulli_logit_lpmf(slice_judg | eta);

You really should first vectorize the priors and the hierarchical priors! Those loops look suspicious to me.

You can use to vector applied to a matrix which often lets you apply densities to large chunks of parameters.