# How to speed up multivariate mixed model with latent variables?

Hi everyone,

I try to fit a multivariate mixed model with latent variables. I regress the three dependent variables AdAwa, BI and Con on a set of seven latent stock variables and some regression specific control variables for I brands and T calender weeks.

(2) BI_{it}=\alpha_i + Stock_{it} \beta^{Stock}_i + CV^{BI}_{it} \beta^{CV_{BI}}_i + \epsilon^{BI}
(3) Con_{it}=\alpha_i + Stock_{it} \beta^{Stock}_i + CV^{Con}_{it} \beta^{CV_{Con}}_i + \epsilon^{Con}

with

(4) Stock_{it} = \lambda Stock_{it-1} + (1-\lambda) Flow_{it} ,
(5) Stock_{it=1} = Flow_{it=1} and
(6) 0\leq \lambda \leq 0.99

I allow for a correlation between the three error terms \epsilon^{AdAwa}, \epsilon^{BI} and \epsilon^{Con}. Further I assume for each equation that both \beta vectors for the stock and equation specific control variables are coming from one multivariate normal distribution.

Our data set consists of 20 brands and 200 weeks per brand.

data {

int<lower=0> N; //number of observations
int T; //time points
int<lower=0> N_id; //number of cross-sections
int<lower=0> id[N]; //id of cross-sections
int<lower=0> S; //number of stock variables
int<lower=0> A; //number of coefficients for AdAwa
int<lower=0> B; //number of coefficients for BI
int<lower=0> C; //number of coefficients for Con
int<lower=0> SA; //number of coefficients for Stock + CV for AdAwa
int<lower=0> SB; //number of coefficients for Stock + CV for BI
int<lower=0> SC; //number of coefficients for Stock + CV for Con
matrix[T, N_id] CV_TV_flow_matrix; // flow variable for 1. stock variable
matrix[T, N_id] CV_Print_flow_matrix;// flow variable for 2. stock variable
matrix[T, N_id] CV_Digital_flow_matrix;// flow variable for 3. stock variable
matrix[T, N_id] CV_Radio_flow_matrix;// flow variable for 4. stock variable
matrix[T, N_id] CV_Kino_flow_matrix;// flow variable for 5. stock variable
matrix[T, N_id] CV_OOH_flow_matrix;// flow variable for 6. stock variable
matrix[T, N_id] CV_DirectMail_flow_matrix;// flow variable for 7. stock variable
matrix[N, B] X_BI; //CV predictor matrix BI;
matrix[N, C] X_Con; //CV predictor matrix Con;
vector[N] BI; //outcome values BI
vector[N] Con; //outcome values Con
}
transformed data {
vector[3] Y[N];
vector[N] CV_TV_flow = to_vector(CV_TV_flow_matrix);
vector[N] CV_Print_flow = to_vector(CV_Print_flow_matrix);
vector[N] CV_Digital_flow = to_vector(CV_Digital_flow_matrix);
vector[N] CV_Kino_flow = to_vector(CV_Kino_flow_matrix);
vector[N] CV_OOH_flow = to_vector(CV_OOH_flow_matrix);
vector[N] CV_DirectMail_flow = to_vector(CV_DirectMail_flow_matrix);
int NC_AdAwa = (SA*(SA-1))/2; //number of correlations for random coefficients in AdAwa
int NC_BI = (SB*(SB-1))/2; //number of correlations for random coefficients in BI
int NC_Con = (SC*(SC-1))/2; //number of correlations for random coefficients in Con

for (n in 1:N) {
}

}
parameters {
vector[B] Beta_0_BI_raw;
vector[S] Beta_0_BI_stock_raw;
vector[C] Beta_0_Con_raw;
vector[S] Beta_0_Con_stock_raw;
real Alpha_0_BI_raw;
real Alpha_0_Con_raw;

real <lower=0, upper=0.99> lambda_TV;
real <lower=0, upper=0.99> lambda_Print;
real <lower=0, upper=0.99> lambda_Digital;
real <lower=0, upper=0.99> lambda_Kino;
real <lower=0, upper=0.99> lambda_OOH;
real <lower=0, upper=0.99> lambda_DirectMail;

real<lower=0> sd_Alpha_AdAwa_a;  // group-level standard deviations
real<lower=0> sd_Alpha_AdAwa_b;  // group-level standard deviations
vector[N_id] z_Alpha_AdAwa; // standardized group-level effects
real<lower=0> sd_Alpha_BI_a;  // group-level standard deviations
real<lower=0> sd_Alpha_BI_b;  // group-level standard deviations
vector[N_id] z_Alpha_BI; // standardized group-level effects
real<lower=0> sd_Alpha_Con_a;  // group-level standard deviations
real<lower=0> sd_Alpha_Con_b;  // group-level standard deviations
vector[N_id] z_Alpha_Con; // standardized group-level effects

vector<lower=0>[SA] sd_Beta_AdAwa_a;  // group-level standard deviations
vector<lower=0>[SA] sd_Beta_AdAwa_b;  // group-level standard deviations
matrix[SA, N_id] z_Beta_AdAwa;  // standardized group-level effects
// cholesky factor of correlation matrix
vector<lower=0>[SB] sd_Beta_BI_a;  // group-level standard deviations
vector<lower=0>[SB] sd_Beta_BI_b;  // group-level standard deviations
matrix[SB, N_id] z_Beta_BI;  // standardized group-level effects
// cholesky factor of correlation matrix
cholesky_factor_corr[SB] L_Beta_BI;
vector<lower=0>[SC] sd_Beta_Con_a;  // group-level standard deviations
vector<lower=0>[SC] sd_Beta_Con_b;  // group-level standard deviations
matrix[SC, N_id] z_Beta_Con;  // standardized group-level effects
// cholesky factor of correlation matrix
cholesky_factor_corr[SC] L_Beta_Con;

vector <lower=0> [3] sigma_a;
vector <lower=0> [3] sigma_b;
cholesky_factor_corr[3] L_Omega;
}

transformed parameters {

matrix[T, N_id] CV_TV_stock_matrix;
matrix[T, N_id] CV_Print_stock_matrix;
matrix[T, N_id] CV_Digital_stock_matrix;
matrix[T, N_id] CV_Kino_stock_matrix;
matrix[T, N_id] CV_OOH_stock_matrix;
matrix[T, N_id] CV_DirectMail_stock_matrix;
vector[N] CV_TV_stock;
vector[N] CV_Print_stock;
vector[N] CV_Digital_stock;
vector[N] CV_Kino_stock;
vector[N] CV_OOH_stock;
vector[N] CV_DirectMail_stock;

vector[B] Beta_0_BI = 2 * Beta_0_BI_raw; // implies ~normal(0,2)
vector[S] Beta_0_BI_stock = 2 *Beta_0_BI_stock_raw; // implies ~normal(0,2)
vector[C] Beta_0_Con = 2 * Beta_0_Con_raw; // implies ~normal(0,2)
vector[S] Beta_0_Con_stock = 2 * Beta_0_Con_stock_raw; // implies ~normal(0,2)
real Alpha_0_BI = 5 * Alpha_0_BI_raw; // implies ~normal(0,5)
real Alpha_0_Con = 5 * Alpha_0_Con_raw; // implies ~normal(0,5)

real <lower=0>  sd_Alpha_BI = sd_Alpha_BI_a .* sqrt(sd_Alpha_BI_b);// implies ~cauchy(0,1)
real <lower=0>  sd_Alpha_Con = sd_Alpha_Con_a .* sqrt(sd_Alpha_Con_b);// implies ~cauchy(0,1)
vector<lower=0>[SB] sd_Beta_BI = sd_Beta_BI_a .* sqrt(sd_Beta_BI_b);// implies ~cauchy(0,1)
vector<lower=0>[SC] sd_Beta_Con = sd_Beta_Con_a .* sqrt(sd_Beta_Con_b);// implies ~cauchy(0,1)
vector <lower=0> [3] sigma = sigma_a .* sqrt(sigma_b);// implies ~cauchy(0,1)

// actual group-level effects

vector[N_id] Alpha_i_BI = sd_Alpha_BI * z_Alpha_BI;
// actual group-level effects
matrix[N_id, SB] Beta_i_BI = (diag_pre_multiply(sd_Beta_BI, L_Beta_BI) * z_Beta_BI)';

vector[N_id] Alpha_i_Con = sd_Alpha_Con * z_Alpha_Con;
// actual group-level effects
matrix[N_id, SC] Beta_i_Con = (diag_pre_multiply(sd_Beta_Con, L_Beta_Con) * z_Beta_Con)';

// state equation TV
for (i in 1:N_id){
CV_TV_stock_matrix[1,i] = CV_TV_flow_matrix[1, i];
for(t in 2:T) {
CV_TV_stock_matrix[t, i] = lambda_TV * CV_TV_stock_matrix[t-1, i] + (1 - lambda_TV) * CV_TV_flow_matrix[t, i];
}
}
CV_TV_stock = to_vector(CV_TV_stock_matrix);

// state equation Print
for (i in 1:N_id){
CV_Print_stock_matrix[1,i] = CV_Print_flow_matrix[1, i];
for(t in 2:T) {
CV_Print_stock_matrix[t, i] = lambda_Print * CV_Print_stock_matrix[t-1, i] + (1 - lambda_Print) * CV_Print_flow_matrix[t, i];
}
}
CV_Print_stock = to_vector(CV_Print_stock_matrix);

// state equation Digital
for (i in 1:N_id){
CV_Digital_stock_matrix[1,i] = CV_Digital_flow_matrix[1, i];
for(t in 2:T) {
CV_Digital_stock_matrix[t, i] = lambda_Digital * CV_Digital_stock_matrix[t-1, i] + (1 - lambda_Digital) * CV_Digital_flow_matrix[t, i];
}
}
CV_Digital_stock = to_vector(CV_Digital_stock_matrix);

for (i in 1:N_id){
for(t in 2:T) {
}
}

// state equation Kino
for (i in 1:N_id){
CV_Kino_stock_matrix[1,i] = CV_Kino_flow_matrix[1, i];
for(t in 2:T) {
CV_Kino_stock_matrix[t, i] = lambda_Kino * CV_Kino_stock_matrix[t-1, i] + (1 - lambda_Kino) * CV_Kino_flow_matrix[t, i];
}
}
CV_Kino_stock = to_vector(CV_Kino_stock_matrix);

// state equation OOH
for (i in 1:N_id){
CV_OOH_stock_matrix[1,i] = CV_OOH_flow_matrix[1, i];
for(t in 2:T) {
CV_OOH_stock_matrix[t, i] = lambda_OOH * CV_OOH_stock_matrix[t-1, i] + (1 - lambda_OOH) * CV_OOH_flow_matrix[t, i];
}
}
CV_OOH_stock = to_vector(CV_OOH_stock_matrix);

// state equation DirectMail
for (i in 1:N_id){
CV_DirectMail_stock_matrix[1,i] = CV_DirectMail_flow_matrix[1, i];
for(t in 2:T) {
CV_DirectMail_stock_matrix[t, i] = lambda_DirectMail * CV_DirectMail_stock_matrix[t-1, i] + (1 - lambda_DirectMail) * CV_DirectMail_flow_matrix[t, i];
}
}
CV_DirectMail_stock = to_vector(CV_DirectMail_stock_matrix);

}

model {
//define parameters
vector[N] mu_BI = Alpha_0_BI + AdStock * Beta_0_BI_stock + X_BI*Beta_0_BI + rows_dot_product(rep_vector(1, N), Alpha_i_BI[id]) + rows_dot_product(append_col(AdStock, X_BI), Beta_i_BI[id]);
vector[N] mu_Con = Alpha_0_Con + AdStock * Beta_0_Con_stock + X_Con*Beta_0_Con + rows_dot_product(rep_vector(1, N), Alpha_i_Con[id]) + rows_dot_product(append_col(AdStock, X_Con), Beta_i_Con[id]);
matrix[3, 3] LSigma = diag_pre_multiply(sigma, L_Omega);
vector[3] mu[N];

for (n in 1:N) {
}

//priors

//coefficients
Alpha_0_BI_raw ~ normal(0, 1);
Alpha_0_Con_raw ~ normal(0, 1);
Beta_0_BI_raw ~ normal(0, 1);
Beta_0_BI_stock_raw ~ normal(0, 1);
Beta_0_Con_raw ~ normal(0, 1);
Beta_0_Con_stock_raw ~ normal(0, 1);

z_Alpha_BI ~ normal(0, 1);
z_Alpha_Con ~ normal(0, 1);
to_vector(z_Beta_BI) ~ normal(0, 1);
to_vector(z_Beta_Con) ~ normal(0, 1);

//sigmas
sd_Alpha_BI_a ~ normal(0,1);
sd_Alpha_BI_b ~ inv_gamma(0.5, 0.5);
sd_Alpha_Con_a ~ normal(0,1);
sd_Alpha_Con_b ~ inv_gamma(0.5, 0.5);

sd_Beta_BI_a ~ normal(0,1);
sd_Beta_BI_b ~ inv_gamma(0.5, 0.5);
L_Beta_BI ~ lkj_corr_cholesky(2);
sd_Beta_Con_a ~ normal(0,1);
sd_Beta_Con_b ~ inv_gamma(0.5, 0.5);
L_Beta_Con ~ lkj_corr_cholesky(2);

sigma_a ~ normal(0,1);
sigma_b ~ inv_gamma(0.5, 0.5);
L_Omega ~ lkj_corr_cholesky(2);

//likelihood
Y ~ multi_normal_cholesky(mu, LSigma);

}

generated quantities {
vector[N_id] Alpha_BI;
vector[N_id] Alpha_Con;
matrix[N_id, SB] Beta_BI;
matrix[N_id, SC] Beta_Con;
corr_matrix[SB] Cor_Beta_BI;
vector<lower=-1,upper=1>[NC_BI] cor_Beta_BI;
corr_matrix[SC] Cor_Beta_Con;
vector<lower=-1,upper=1>[NC_Con] cor_Beta_Con;
corr_matrix[3] Rescor = multiply_lower_tri_self_transpose(L_Omega);
vector<lower=-1,upper=1>[3] rescor;

for(i in 1:N_id) {
Alpha_BI[i] = Alpha_0_BI + Alpha_i_BI[i];
Beta_BI[i, ] = append_col(to_row_vector(Beta_0_BI_stock),to_row_vector(Beta_0_BI)) + Beta_i_BI[i, ];
Alpha_Con[i] = Alpha_0_Con + Alpha_i_Con[i];
Beta_Con[i, ] = append_col(to_row_vector(Beta_0_Con_stock),to_row_vector(Beta_0_Con)) + Beta_i_Con[i, ];
}
// group-level correlations
// extract upper diagonal of correlation matrix
for (l in 1:SA) {
for (j in 1:(l - 1)) {
}
}
Cor_Beta_BI = multiply_lower_tri_self_transpose(L_Beta_BI);
for (l in 1:SB) {
for (j in 1:(l - 1)) {
cor_Beta_BI[choose(l - 1, 2) + j] = Cor_Beta_BI[j, l];
}
}
Cor_Beta_Con = multiply_lower_tri_self_transpose(L_Beta_Con);
for (l in 1:SC) {
for (j in 1:(l - 1)) {
cor_Beta_Con[choose(l - 1, 2) + j] = Cor_Beta_Con[j, l];
}
}
for (k in 1:3) {
for (j in 1:(k - 1)) {
rescor[choose(k - 1, 2) + j] = Rescor[j, k];
}
}

}


I have already run similar smaller models for some toy data sets and the model performance was good. But for this model and my real data, the performane is awful. The model currently runs for 16 hours and has not even finished the first 200 iterations. Is there anything in the code I could try to speed up the model?

Thanks!

I think the generic suggestions would be (1) turn on GPU support since you’re using lots of matrix operations (2) parallelise the likelihood function with map_rect. In your case, I think point (2) will be ugly and hard because of the map_rect interface at the moment, but point (1) may provide some serious improvements.

Some info about getting the math library to run with GPU support.

Thanks for your idea! I hope I turned on GPU support correctly but the model is still really slow. For the GPU support it says that this is activated for very large matrices (i.e. 1200x1200 matrices) anyway. So maybe the GPU support doesn’t work for me because I have many matrix operations but the matrices are not really large.

I will have a look at map_rect! But maybe I have to accept, that my model will be slow. But of course I would be very happy about further ideas!

Ah sorry to hear that , but I’ve no doubt there is ways to optimise it and some of the wizards on this forum doubtless know how.

I think what may help is breaking the model down a bit so it’s easier for people to engage with without having to spend a lot of time parsing it first. Our best guys are really busy so anything that makes it easy to dip into a question quickly
will help get a better response.