Adding more highly correlated variables breaks non-centered model

Andrewhf · May 13, 2023, 9:57pm

Hello! I am building a multi-level model to predict EEG brain amplitude per participant. I have 128 sensors that are going to be highly correlated. If I use all 128 the model can’t find usable initial values.

If I use 2 sensors the model works and the results make sense. I get each participant’s amplitude per sensor and the correlation between the two sensors is .95 with most of the posterior between .85 to .99. The sensors are right next to each other so that correlation makes sense to me. But other sensors farther away will have a strong negative correlation.

If I use 20 sensors, the model runs and all the convergence statistics look great, but the results don’t make sense any more. Now the posterior for the correlation between sensor 1 and 2 is .39. I thought adding more sensors would make the model more confident about correlations, not less, but maybe I am missing something.

I have the Stan code below for the 2 sensor analysis and 20 sensor analysis. I am new to Stan, so I have been using the rethinking package to generate this Stan code. I have tried brms and wouldn’t mind finding a solution with that package if someone has a suggestion with that software.

My statistical assumptions:

amp \sim normal(\mu,\sigma)
\mu = \alpha_{participant,channnel}
\alpha_j \sim MVNormal(M,R,S)
\sigma,S \sim exponential(.5)
R \sim LKJcorr(1)

2 sensor code

data{
     vector[312] erp;
    array[312] int chan;
    array[312] int par;
}
parameters{
     matrix[2,13] z_par;
     vector<lower=0>[2] sigma_par;
     cholesky_factor_corr[2] L_rho_par;
     real<lower=0> sigma;
}
transformed parameters{
     matrix[13,2] alpha;
    alpha = (diag_pre_multiply(sigma_par, L_rho_par) * z_par)';
}
model{
     vector[312] mu;
    sigma ~ exponential( 0.5 );
    L_rho_par ~ lkj_corr_cholesky( 1 );
    sigma_par ~ exponential( 0.5 );
    to_vector( z_par ) ~ normal( 0 , 1 );
    for ( i in 1:312 ) {
        mu[i] = alpha[par[i], chan[i]];
    }
    erp ~ normal( mu , sigma );
}
generated quantities{
     matrix[2,2] rho_par;
    rho_par = multiply_lower_tri_self_transpose(L_rho_par);
}

20 sensor code

data{
     vector[3120] erp;
    array[3120] int chan;
    array[3120] int par;
}
parameters{
     matrix[20,13] z_par;
     vector<lower=0>[20] sigma_par;
     cholesky_factor_corr[20] L_rho_par;
     real<lower=0> sigma;
}
transformed parameters{
     matrix[13,20] alpha;
    alpha = (diag_pre_multiply(sigma_par, L_rho_par) * z_par)';
}
model{
     vector[3120] mu;
    sigma ~ exponential( 0.5 );
    L_rho_par ~ lkj_corr_cholesky( 1 );
    sigma_par ~ exponential( 0.5 );
    to_vector( z_par ) ~ normal( 0 , 1 );
    for ( i in 1:3120 ) {
        mu[i] = alpha[par[i], chan[i]];
    }
    erp ~ normal( mu , sigma );
}
generated quantities{
     matrix[20,20] rho_par;
    rho_par = multiply_lower_tri_self_transpose(L_rho_par);
}

Charles_Driver · May 15, 2023, 7:49am

Correlations, particularly high dimensional ones, are pretty hard to deal with re priors. There have been some long discussions on this site, maybe a search and one of the ideas works for you. I remember posts from @mike-lawrence , @spinkney , and myself.

Topic		Replies	Views
Multivariate normal with highly correlated data Modeling fitting-issues , performance	16	2041	December 4, 2018
Linear Regression with Highly Correlated Covariates Modeling	8	2450	July 12, 2019
Multivariate Hierarchical Model Modeling techniques , fitting-issues	7	1058	April 7, 2021
"Pairwise" alternative to multivariate normal isn't behaving for the hierarchical case; help? Modeling techniques , specification	13	991	June 10, 2021
Correlated coefficients example - what's going wrong? Modeling	9	1721	October 13, 2021

Adding more highly correlated variables breaks non-centered model

My statistical assumptions:

2 sensor code

20 sensor code

Related topics