What causes correlation between the log posterior and model parameters?

Hello all,

I’m trying to estimate a simple duration model with random effects. When I look at the pairs() plots, the parameters themselves are uncorrelated, but I find one parameter to be highly correlated with lp__. The draws for this parameter have low effective sample size and seem to be not very well behaved.

So my question is: what induces correlation between a model parameter and the log posterior? How do you re-parameterize to reduce this correlation? Is that even the right goal?

Here is a snippet of R-code to simulate from the model:

c = exp(rnorm(5, sd = .1))
lambda = rnorm(N, sd = 1)
R_1 = lambda  * c[1] - rnorm(N) > 0
R_2 = (lambda * c[2] - rnorm(N) > 0)*R_1
R_3 = (lambda * c[3] - rnorm(N) > 0)*R_2
R_4 = (lambda * c[4] - rnorm(N) > 0)*R_3
R_5 = (lambda * c[5] - rnorm(N) > 0)*R_4

I constrain the c parameters to be positive to set the sign of the latent variable. Now when I try to estimate the model in RStan, I get poor draws for c[1] and c[2] but decent draws for c[3], c[4], and c[5]. c[1] and c[2] are negatively correlated in the posterior, which I suspect to be the source of the problem. However, I haven’t been able to reparameterize the model in such a way that improves performance. Any help would be much appreciated.

Here is the stan program I wrote:

  data {
  int<lower = 1> N;     
  int<lower = 0, upper = N> R_1[N];

  int<lower = 0, upper = N> R_2_N;
  int<lower = 0, upper = N> R_2_ind[R_2_N];
  int<lower = 0, upper = 1> R_2_[R_2_N];

  int<lower = 0, upper = N> R_3_N;
  int<lower = 0, upper = N> R_3_ind[R_3_N];
  int<lower = 0, upper = 1> R_3_[R_3_N];

  int<lower = 0, upper = N> R_4_N;
  int<lower = 0, upper = N> R_4_ind[R_4_N];
  int<lower = 0, upper = 1> R_4_[R_4_N];

  int<lower = 0, upper = N> R_5_N;
  int<lower = 0, upper = N> R_5_ind[R_5_N];
  int<lower = 0, upper = 1> R_5_[R_5_N];
}
parameters {
 vector[N] lambda;
 vector[5] l_c;
}
transformed parameters {
 vector<lower = 0>[5] c  = exp(l_c);
}
model {
 lambda     ~ normal(0,1);
 l_c            ~ normal(0,1);

 R_1 ~  bernoulli( 
               Phi_approx(
                lambda * c[1]
               ));
 R_2_ ~ 
    bernoulli(
      Phi_approx(
        lambda[R_2_ind] * c[2]
     ));

 R_3_ ~ 
   bernoulli(
     Phi_approx(
      lambda[R_3_ind] * c[3]
   ));

 R_4_ ~ 
   bernoulli(
     Phi_approx(
       lambda[R_4_ind] * c[4]
     ));

 R_5_ ~ 
   bernoulli(
    Phi_approx(
      lambda[R_5_ind] * c[5]
    ));
  }

The data objects in the stan program can be generated from the simulated draws using this snippet of R-code:

R_2_ind = which(R_1 == 1)
R_2_N   = length(R_2_ind)
R_2_ = R_2[R_2_ind]

R_3_ind = which(R_2 == 1)
R_3_N   = length(R_3_ind)
R_3_ = R_3[R_3_ind]

R_4_ind = which(R_3 == 1)
R_4_N   = length(R_4_ind)
R_4_ = R_4[R_4_ind]

R_5_ind = which(R_4 == 1)
R_5_N   = length(R_5_ind)
R_5_ = R_5[R_5_ind]

My two cents: if the log_posterior is independent of a parameter, you should get rid, of fix this parameter, because it has no influence on the bayesian inference process…

Usually it’s not so much a direct linear correlation as a hill with a peak at the marginal maximum likelihood parameter value.

:-) The only exception would be when it’s some kind of generated quantity, which Stan will display like a parameter in interfaces.

Hi,

I’m fitting a multi-level zero Inflated binomial model with variance components on the binomial parameter (the specific context is an ecological occupancy model).

I’ve got a strong lp__ correlation ( r ~ 0.8; see the attached image) with one of the variance terms that varies at the scale of the binomial sample; the binomial index is 6. This effectively adds extra binomial heterogeneity to the sample. The prior is a (half-)cauchy(0, 0.25) and the simulated/known value is 3.

Our log_likelihood decreases with increasing standard deviation. This makes sense–as it appears to imply we have increasing information at the sampling scale. Should we be concerned about this correlation?

In the attached paris plot, note that gammaSDsite is the parameter of concern. gammaSDyear is another standard deviation parameter, but in this case this parameter varies at the scale of the population. gamma2 is included as a representative distribution of the model coefficients.

23 AM

We can post a reproducible example to the forum if it helps. Also, if we prefer us to start a new thread, I am happy to do so!

lp is a direct function of the values of your parameters, it’s the posterior density of the parameters, it’s the quantity that Stan calculates in order to sample from the distribution

lp__(gamma2,gammaSDyear,gammaSDsite) == log (p(gamma2,gammaSDyear,gammaSDsite | data))

lp__ is not a parameter that Stan samples, so this correlation doesn’t cause a problem for sampling… it’s just telling you that as SDsite increases the overall range of densities that occur decreases, just as when you change the normal(0,1) curve to normal(0,2) the overall height of the curve decreases

so, no I don’t think you should be concerned. You should only be concerned if you have correlations between your parameters, and then only if it causes sufficient problems that you can’t get a reliable sample or have lots of divergences.

Edit:
Mathematically, what causes this is for example in the normal case normal_pdf(x|m,s) is

p(x|m,s) = exp(-((x-m)/s)^2/2) * / (sqrt(2*pi)*s)

If you notice, the s appears in the denominator of the density. after taking the logarithm you will get:

-((x-m)/s)^2/2 - log(sqrt(2*pi)) - log(s)

as s increases, -log(s) decreases

So in general, whenever you have a parameter that multiplies or divides the density, it will have this kind of behavior. also parameters that appear in exponents:

log(a^s) = s * log(a)

so again as s increases, so does s*log(a) and the like.

1 Like