Hi All (@Bob_Carpenter ),
I am very new to Stan. I am planning to estimate the following model. There are three types of datasets (Version_1, Version_2, Version_3). At the first level, the following linear regression models are estimated for latent variable “att”:-
- Version_1 uses X_lk as predictors
- Version_2 uses X_lk as predictors
- Version_3 uses X_oe as predictors
I have re-parameterized “att” and estimated “y_att”
Using the estimated values of “y_att”, a multinomial logit model is estimated for the choice “y”. The choice variable has 3 levels. I have tried to vectorise all the levels, except the last one for
"for (n in 1:N)
y[n] ~ categorical(softmax(alpha + beta_att * y_att[n]' + beta_sd * X_sd[n]')); // likelihood"
All the steps prior to this is estimated very quickly. With the inclusion of the above choice model, the estimation takes days to complete.
My dataset contains 14261 observations, # of predictors (D_lk-100, D_oe-34, D_sd-55). # of MCMC iterations-3000, # of warm-ups-600. I am running 4 chains
Is there a way to reduce the estimation time? Furthermore, are there any suggestions on improving my code further?
Any suggestions and help will be greatly appreciated.
Thank you in advance.
data {
int<lower=0> N; // Number of observations
int<lower=0> D_lk; // Number of predictors for Version_1 of the dataset
int<lower=0> D_oe; // Number of predictors for Version_3 of the dataset
int<lower=0> D_sd; // Number of predictors for sd
int<lower=0> C; // Number of classes
matrix[N,D_lk] X_lk; // Predictor matrix for Version_1 of the dataset
matrix[N,D_oe] X_oe; // Predictor matrix for Version_3 of the dataset
matrix[N,D_sd] X_sd; // Predictor matrix for sd
int<lower=1,upper=C> y[N]; // Categorical dependent variable
int<lower=0> K; // Number of multivariate dimensions
matrix[N, K] bern1; // Version_1 dummy
matrix[N, K] bern2; // Version_2 dummy
matrix[N, K] bern3; // Version_3 dummy
}
parameters {
row_vector[K] alpha_lkv1; // Intercept for Version_1
matrix[D_lk,K] gamma_lkv1; // Coefficients of predictors for Version_1
row_vector[K] alpha_lkv2; // Intercept for Version_2
matrix[D_lk,K] gamma_lkv2; // Coefficients of predictors for Version_2
row_vector[K] alpha_oe; // Intercept for Version_3
matrix[D_oe,K] gamma_oe; // Coefficients of predictors for Version_3
vector[C] alpha; // Intercept for the choice model
matrix[C,K] beta_att; // Coefficient for Ind_var_1
matrix[C,D_sd] beta_sd; // Coefficients of predictors for sd
cov_matrix[K] Sigma; // Covariance Matrix
matrix[N, K] alpha_scale; // Re-parameterisation scale factor
}
transformed parameters {
matrix[N, K] att; // Regression equation for Ind_var_1
matrix[K,K] L; // Cholesky factor Matrix
matrix[N, K] y_att; // Predicted value for latent attitudes
matrix[N,K] v1; // Value matrix for Version_1
matrix[N,K] v2; // Value matrix for Version_2
matrix[N,K] v3; // Value matrix for Version_3
matrix[N,K] m1; // Matrix of intercepts for Version_1
matrix[N,K] m2; // Matrix of intercepts for Version_2
matrix[N,K] m3; // Matrix of intercepts for Version_3
m1 = rep_matrix(alpha_lkv1, N);
m2 = rep_matrix(alpha_lkv2, N);
m3 = rep_matrix(alpha_oe, N);
v1 = m1 + (X_lk * gamma_lkv1);
v2 = m2 + (X_lk * gamma_lkv2);
v3 = m3 + (X_oe * gamma_oe);
att = (bern1 .* v1) + (bern2 .* v2) + (bern3 .* v3);
L = cholesky_decompose(Sigma); // Decomposing the Covariance matrix into Cholesky components
y_att = att + (alpha_scale * L);
}
model {
alpha_lkv1 ~ normal(0,1); // Prior for the constants for Version_1
alpha_lkv2 ~ normal(0,1); // Prior for the constants for Version_2
alpha_oe ~ normal(0,1); // Prior for the constants for Version_3
to_vector(gamma_lkv1)~ normal(0,1); // Priors for the coefficients for Version_1
to_vector(gamma_lkv2)~ normal(0,1); // Priors for the coefficients for Version_2
to_vector(gamma_oe)~ normal(0,1); // Priors for the coefficients for Version_3
Sigma ~ inv_wishart((K+1), diag_matrix(rep_vector(0.1, K))); // Priors for the Covariance Matrix
to_vector(alpha_scale) ~ normal(0,1); // Priors for the re-parameterisation
alpha ~ normal(0,1); // Prior for the constants for the choice model
to_vector(beta_att) ~ normal(0,1); // Prior for the Ind_var_1 for the choice model
to_vector(beta_sd) ~ normal(0,1); // Priors for the coefficients of sd
for (n in 1:N)
y[n] ~ categorical(softmax(alpha + beta_att * y_att[n]' + beta_sd * X_sd[n]')); // likelihood
}
MODERATOR EDIT: @martinmodrak reformatted code