I am fitting a latent variable model. My original code runs with no problem for both real and simulated data; however, it takes too much time. I have access to a computing cluster with as many as 196 cores, so I think I can significantly speed up things. I tried to follow the example at this link to do within-chain parallelization, but I am stuck and don’t want to waste more time if it is not something possible.
Below, I will briefly describe the model. I will provide the original code (may not be optimal, but best I could write and runs with no issues). At the very end, I will also provide my attempt to rewrite it for within-chain parallelization and errors I am getting. I appreciate any advice or guidance in the right direction.
Thank you in advance.
Model specification
Y_{ij} | \tau_{tj}, \tau_{cj}, \beta_i, \alpha_i, T_j, C_i \sim N(\mu_{ij},\sigma_i)
\mu_{ij} = (\beta_i - \tau_{tj})^{1-T_j} \times \left [(1-C_i) \times (\beta_i - \tau_{tj}) + C_i \times (\beta_i - \tau_{cj})\right ]^{T_j}
\sigma_i = \frac{1}{\alpha_i^2}
Observed Data
- Y_{ij}, log of observed response time for individual j on item i
- C_i, an indicator variable for the status of item i (1 = compromised, 0: not compromised)
To be estimated:
- \beta_i, time-intensity parameter for item i
- \alpha_i, time discrimination parameter for item i
- \tau_{tj}, latent speed for individual j when responding to an uncompromised item
- \tau_{cj}, latent speed for individual j when responding to a compromised item
- T_j, an indicator variable for person status (1: a cheater individual with prior knowledge of item, 0: an honest examinee with no prior knowledge of item)
T_j is deterministically defined at every draw such that
T_j = 1 if \tau_{cj} > \tau_{tj} and T_j = 0 if \tau_{tj} > \tau_{cj}.
Original model syntax that runs successfully
data{
int <lower=1> I; // number of items
int <lower=1> J; // number of individuals
int <lower=1> n_obs; // number of observations (I x J)
int <lower=1> ind_person_obs[n_obs]; // index variable for individuals positions
int <lower=1> ind_item_obs[n_obs]; // index variable for items' position
int <lower=0,upper=1> i_status_obs[n_obs]; // index variable for item status
real Y[n_obs]; // log of observed response time
}
parameters {
vector[J] beta; // time-intensity parameters
vector <lower=0> [J] alpha; //time discrimination parameters
vector[I] tau_t; // individual latent speed parameters for uncompromised items
vector[I] tau_c; // individual latent speed parameters for compromised items
real mu1; // hyperparameter for the mean of beta
real<lower=0> sigma1; // hyperparameter for the sd of beta
real<lower=0> sigma_t; // hyperparameter for the sd of tau_t
real<lower=0> sigma_c; // hyperparameter for the sd of tau_c
}
transformed parameters {
vector[I] T;
for (i in 1:n_obs) {
if(tau_t[ind_person_obs[i]]>tau_c[ind_person_obs[i]])
T[ind_person_obs[i]] = 0;
else
T[ind_person_obs[i]] = 1;
}
}
model{
sigma_t ~ exponential(1);
sigma_c ~ exponential(1);
tau_t ~ normal(0,sigma_t);
tau_c ~ normal(0,sigma_c);
mu1 ~ normal(3.98,1);
sigma1 ~ exponential(1);
beta ~ normal(mu1,sigma1);
alpha ~ inv_gamma(800,1550);
for (i in 1:n_obs) {
real p_t = beta[ind_item_obs[i]]-tau_t[ind_person_obs[i]];
real p_c = beta[ind_item_obs[i]]-tau_c[ind_person_obs[i]];
real p = (p_t^(1-T[ind_person_obs[i]]))*
(((1-i_status_obs[i])*p_t +
(i_status_obs[i])*p_c)^T[ind_person_obs[i]]);
Y[i] ~ normal(p,1/(alpha[ind_item_obs[i]]^2));
}
}
My (unsuccessful) attempt for within-chain parallelization
I tried to replace i with start:end and put everything in normal_lpdf function, but I think I don’t have a good understanding of how this works. This is the error message I received:
Compiling Stan program...
|
Semantic error in 'C:/Users/cengiz/AppData/Local/Temp/RtmpgLAk9E/model-3c2462812197.stan', line 13, column 40 to column 63:
-------------------------------------------------
11: vector T) {
12:
13: return normal_lpdf(slice_Y | ((beta[ind_item_obs[start:end]]-tau_t[ind_person_obs[start:end]])^(1-T[ind_person_obs[start:end]]))*
^
14: (((1-i_status_obs[start:end])*(beta[ind_item_obs[start:end]]-tau_t[ind_person_obs[start:end]]) +
15: (i_status_obs[start:end])*beta[ind_item_obs[start:end]]-tau_c[ind_person_obs[start:end]])^T[ind_person_obs[start:end]]),
-------------------------------------------------
Only expressions of array, matrix, row_vector and vector type may be indexed. Instead, found type int.
mingw32-make.exe: *** [make/program:53: C:/Users/cengiz/AppData/Local/Temp/RtmpgLAk9E/model-3c2462812197.hpp] Error 1
Error: An error occured during compilation! See the message above for more information.
Within-chain parallelization code:
functions {
real partial_sum(real[] slice_Y,
int start, int end,
int ind_person_obs,
int ind_item_obs,
int i_status_obs,
vector beta,
vector alpha,
vector tau_t,
vector tau_c,
vector T) {
return normal_lpdf(slice_Y | ((beta[ind_item_obs[start:end]]-tau_t[ind_person_obs[start:end]])^(1-T[ind_person_obs[start:end]]))*
(((1-i_status_obs[start:end])*(beta[ind_item_obs[start:end]]-tau_t[ind_person_obs[start:end]]) +
(i_status_obs[start:end])*beta[ind_item_obs[start:end]]-tau_c[ind_person_obs[start:end]])^T[ind_person_obs[start:end]]),
1/(alpha[ind_item_obs[start:end]]^2));
}
}
data{
int <lower=1> I;
int <lower=1> J;
int <lower=1> n_obs;
int <lower=1> ind_person_obs[n_obs];
int <lower=1> ind_item_obs[n_obs];
int <lower=0,upper=1> i_status_obs[n_obs];
real Y[n_obs];
}
parameters {
vector[J] beta;
vector <lower=0> [J] alpha;
vector[I] tau_t;
vector[I] tau_c;
real mu1;
real<lower=0> sigma1;
real<lower=0> sigma_t;
real<lower=0> sigma_c;
}
transformed parameters {
vector[I] T;
for (i in 1:n_obs) {
if(tau_t[ind_person_obs[i]]>tau_c[ind_person_obs[i]])
T[ind_person_obs[i]] = 0;
else
T[ind_person_obs[i]] = 1;
}
}
model{
sigma_t ~ exponential(1);
sigma_c ~ exponential(1);
tau_t ~ normal(0,sigma_t);
tau_c ~ normal(0,sigma_c);
mu1 ~ normal(3.98,1);
sigma1 ~ exponential(1);
beta ~ normal(mu1,sigma1);
alpha ~ inv_gamma(800,1550);
target += reduce_sum(partial_sum, Y , grainsize, ind_person_obs, ind_item_obs,
i_status_obs, beta, alpha, tau_t, tau_c, T);
}