Correlation of estimated SE with covariate leads to correlated parameters

I eventually figured out that the correlation I was getting between my parameters of interest intercept and covar_effect was due to the fact that my binary covariate binary_covar is correlated with the se (higher SE for observations with binary_covar=1).
I’m a bit foggy on why that would lead to correlations in the posterior distributions, and also how bad that is, and also how I could fix it?
Thank you!

my model:


	int <lower=1> N; // tot observed
	int <lower=1> Ngroup; // unique types of 
	vector[N] expvals;
	vector[N] se;
	vector[Ngroup] binary_covar;	
	int<lower=1, upper=Ngroup> group_id[N];	
  int<lower = 0, upper = 1> run_estimation; // a switch to evaluate the likelihood
	vector[Ngroup] intercept_tilde;
	real covar_effect;
	real<lower=0> intercept_sd;
	real intercept;
transformed parameters{
	real Xeff_rep[N];

	vector[Ngroup] intercept_per_group;
	vector[Ngroup] group_mean;

	intercept_per_group = intercept_tilde * intercept_sd  + intercept; 
	group_mean = intercept_per_group + (binary_covar - mean(binary_covar) )* covar_effect; //+ ;
	for (i in 1:N){	
	     Xeff_rep[i] = group_mean[group_id[i]] ;

model {
      covar_effect ~ normal(0, .5); // 
      intercept ~ normal(0, .5); //
      intercept_sd ~ normal(0, .5); //normal(0, 1);
      intercept_tilde ~ std_normal(); //normal(0, 1);
         expvals ~ normal(Xeff_rep, se); // same as theta (transformed theta-tilde) in 8 schools
generated quantities{
	  real eff_rep[N];
	  for (ix in 1:N){
	      eff_rep[ix] = normal_rng(group_mean[group_id[ix]], se[ix]);

Some data:


the correlation is 0.49

1 Like

sorry for not getting to you earlier - did you manage to make some progress in the meantime?

Could you share a bit more of your reasoning? I don’t think I can follow why that could/should be the case.

In many cases correlation in the posterior arises because the data do not inform both parameters individually, but only tell us something about their sum/difference/other combination.

Also - is the correlation a problem? Are your model diagnostics bad? If not then maybe this is the correct posterior?

If you haven’t solved the issue yet, I’ll be glad to discuss it further!

Best of luck with your model!

Thanks for the reply! I did not figure that out via reasoning but by empirical results… if I replace the se vector in the input list with a constant SE ( input$se = rep(.5, length(input$se)) ) then re-run everything I get a different result with no correlations. But maybe it doesn’t matter since the marginal distributions look the same more or less? I guess that was part of my question, if it matters? I can’t think of why the true posterior would be correlated.

I don’t see anything obviously wrong with the diagnostics

Inference for Stan model: simp2.
4 chains, each with iter=5000; warmup=2000; thin=1; 
post-warmup draws per chain=3000, total post-warmup draws=12000.

              mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
intercept    -0.26       0 0.05 -0.35 -0.29 -0.26 -0.22 -0.16  8941    1
covar_effect -0.93       0 0.10 -1.13 -1.00 -0.94 -0.87 -0.73  8613    1

Thanks again!

1 Like

That’s an interesting check!

Looking a bit closer, I think what is happening is that you are not having enough data to learn all the structure you are imposing very well. Looking at an extended pairs plots:

(intercept_sd has been log-transformed)

You can noitce, that when intercept_sd is low (the varying intercept is allowed to explain only small part of the total variation), your intercept and covar_effect are relatively tightly determined. But when intercept_sd is large, they are allowed to take a much wider range of values.

This problem is seen even when setting all the .se to .5:

So what I think is going on is that due to the low number of repetitions per group, the varying intercepts are not well determined and can take in part of the influence attributable to covar_effect (since covar_effect is constant withng group).

Not completely sure why that leads to correlations with the varying SE and not with constant SE. One possibility is that this is linked to the fact that SE is correlated with the covariate (see figure below) - when I set the SE to be constant per binary_covar value, I also get the correlation. This might be further connected to the way you use centering of the binary covariate - if I don’t center I get a negative correlation between intercept and cover_effect, meaning that the data are consistent with either a larger intercept and small difference between covariate values (i.e. the intercept is closer to the overall mean) and with smaller intercept and larger difference between covariate value. Which is once again the situation where your data are unable to differentiate between the various contributions.

But I think the relationship between the intercept_sd and the fixed effects is the most interesting thing happening here.

Does that make sense?