Correlation in multi_normal posterior distribution

Hello,

I wanted to generate a correlated bivariate normal distribution based on two data points (one for each of the 2 variables), and one value of a standard deviation that describes the uncertainty in those values. These are the 4 parameters in the data block.

My data are therefore just used to describe marginal normal distributions, providing no information on the correlation, but I have specified prior models for the parameters in the covariance matrix, specifically, the standard deviations, and the correlation coefficient (based on a separate analysis).

I expected that the posterior bivariate distribution would show correlated samples, because of the inclusion of a correlation coefficient. The posterior model of the correlation coefficient looks good (unchanged from my prior description), but the posterior model of the bivariate normal distribution looks like a plot of 2 independent variables.

Below is the model I am describing. If you can give me any advice on what I am overlooking, that would be much appreciated, but please bear in mind that I am not a statistician.


data {
  real <lower = 0> x1Meas; // Observed value of first parameter
  real <lower = 0> x1Error; // Std dev of uncertainty of measurement of first parameter
  real <lower = 0> x2Meas; // Observed value of second parameter
  real <lower = 0> x2Error; // Std dev of uncertainty of measurement of secondparameter
}

parameters {
  vector <lower = 0> [2] mu; // Means of multi-normal distribution
  real <lower = 0> sigma[2]; // Std devs of multi-normal distribution 
  real <lower = -1, upper = 1> rho;  // Correlation coefficient
  vector <lower = 0> [2] y;  // Parameeter assigned to multi-normal distribution
}

transformed parameters {
  cov_matrix[2] cov = [[      sigma[1] ^ 2       , sigma[1] * sigma[2] * rho],
                       [sigma[1] * sigma[2] * rho,       sigma[2] ^ 2       ]];
}

model {
  // Likelihood
  y ~ multi_normal(mu, cov);
  
  // Priors
  rho ~ normal (0.7, 0.2) T[-1, 1];
  sigma ~ exponential(1);

  mu[1] ~ normal(x1Meas, x1Error) T[0,];
  mu[2] ~ normal(x2Meas, x2Error) T[0,];
  
  x1Meas ~ normal (4, 1) T[0,];
  x1Error ~ exponential(1);
  x2Meas ~ normal (20, 5) T[0,];
  x2Error ~ exponential(0.5);
 
}

In summary, I expected correlated samples (from parameter y), but they appear to be independent.

Many thanks.

I don’t think y should be a parameter, it probably belongs in the generated quantities block.

generated quantities {
  vector [2] y;
  y = multi_normal_rng(mu, cov);
}

Also, this section is slightly strange (at the moment these statements do not influence the posterior distribution at all, they are just constants):

  x1Meas ~ normal (4, 1) T[0,];
  x1Error ~ exponential(1);
  x2Meas ~ normal (20, 5) T[0,];
  x2Error ~ exponential(0.5);

If x1 and x2 are measured with uncertainty then you need a measurement error model: https://mc-stan.org/docs/2_18/stan-users-guide/bayesian-measurement-error-model.html

Also I do not think it that surprising that the posterior distribution has a different level of correlation than the prior. sigma and rho can interact in interesting ways to produce distributions for y that have more/less correlation than you might anticipate.

Thanks @hhau for your response.

I’ve added a y_gen parameter to the generated quantities block.
I was trying to account for measurement error, but I think I’m now doing so based on the guidance in the manual (thanks for the link).

Regarding the possible interaction between sigma and rho
I still do not see any correlated in the posterior samples of y_gen, but the posterior model of rho suggests that there should be (it is pretty much unchanged from the prior, with a mean of approximately 0.7.

Could you explain a little further?
Or do you think there could be another mistake in the model?


...

transformed parameters {
  // Covariance matrix
  cov_matrix[2] cov = [[      sigma[1] ^ 2       , sigma[1] * sigma[2] * rho],
                       [sigma[1] * sigma[2] * rho,       sigma[2] ^ 2       ]];
  
  vector <lower = 0> [2] mu;  // Means of multi-normal distribution
  mu[1] = x1;
  mu[2] = x2;
}

model {
  // Likelihood
  y ~ multi_normal(mu, cov);
  
  // Priors
  rho ~ normal (0.7, 0.2) T[-1, 1];
  sigma ~ exponential(1);

  x1Meas ~ normal (x1, x1Error) T[0,];
  x2Meas ~ normal (x2, x2Error) T[0,];
  
  x1 ~ normal(4, 1) T[0,];
  x2 ~ normal(20, 5) T[0,];
}

generated quantities{
  // Sampling from the estimated posterior distribution
  vector[2] y_gen;
  y_gen = multi_normal_rng(mu, cov);  
}

I would throughly reccomend that you to look at the prior predictive distribution of the nonobservable part of your model, i.e. the rho / sigma part. If you simulate from the prior distribution: (here’s some R code)

library(dplyr)
library(mvtnorm)
library(ggplot2)

ideal_n_mc <- 50000
rho_samples <- data.frame(x = rnorm(ideal_n_mc, mean = 0.7, sd = 0.2)) %>% 
  filter(between(x, -1, 1))
rho_samples <- as.numeric(rho_samples$x)

n_mc <- length(rho_samples)
sigma_1 <- rexp(n_mc)
sigma_2 <- rexp(n_mc)

x_samples <- array(NA, dim = c(n_mc, 2))
for (ii in 1:n_mc) {
  x_samples[ii, ] <- rmvnorm(
    n = 1,
    sigma = matrix(
      c(sigma_1[ii]^2, rho_samples[ii] * sigma_1[ii] * sigma_2[ii],
        rho_samples[ii] * sigma_1[ii] * sigma_2[ii], sigma_2[ii]^2),
      nrow = 2
    )
  )
} 

plot_df <- data.frame(
  x = x_samples[, 1],
  y = x_samples[, 2]
)

ggplot(plot_df, aes(x = x, y = y)) + 
  geom_point(alpha = 0.2) +
  theme_bw()

you can see that the prior predictive distribution is not the distribution I think you think it is. Additionally, the correlation of the prior distribution for x_samples is only ~0.33, and I would expect it to shrink towards zero in the presence of nominally independent data (although I still think it’s slightly incorrect for the model to inform any aspects of the prior structure). If you want to continue with this model I would write it like this:

data {
  real <lower = 0> x1Meas; // Observed value of first parameter
  real <lower = 0> x1Error; // Std dev of uncertainty of measurement of first parameter
  real <lower = 0> x2Meas; // Observed value of second parameter
  real <lower = 0> x2Error; // Std dev of uncertainty of measurement of secondparameter
}

parameters {
  real <lower = 0> sigma[2]; // Std devs of multi-normal distribution 
  real <lower = -1, upper = 1> rho;  // Correlation coefficient
  vector [2] x_true;
}

transformed parameters {
  cov_matrix[2] cov = [[      sigma[1] ^ 2       , sigma[1] * sigma[2] * rho],
                       [sigma[1] * sigma[2] * rho,       sigma[2] ^ 2       ]];
}

model {
  // Priors
  rho ~ normal(0.7, 0.2) T[-1, 1];
  sigma ~ exponential(1);
  
  x_true[1] ~ normal(x1Meas, x1Error);
  x_true[2] ~ normal(x2Meas, x2Error);
   
}

generated quantities {
  vector [2] x_sim;
  x_sim = multi_normal_rng(x_true, cov);
}

although I’m not convinced you actually want sigma, I think perhaps you want x1Error and x2Error to control the scale of the distribution of x_true, whilst you induce some correlation in the posterior via rho in cov. If this is the case, the model is:

data {
  real <lower = 0> x1Meas; // Observed value of first parameter
  real <lower = 0> x1Error; // Std dev of uncertainty of measurement of first parameter
  real <lower = 0> x2Meas; // Observed value of second parameter
  real <lower = 0> x2Error; // Std dev of uncertainty of measurement of secondparameter
}

parameters {
  // real <lower = 0> sigma[2]; // Std devs of multi-normal distribution 
  real <lower = -1, upper = 1> rho;  // Correlation coefficient
  vector [2] x_true;
}

transformed parameters {
  cov_matrix[2] cov = [[      x1Meas ^ 2       , x1Meas * x2Meas * rho],
                       [x1Meas * x2Meas * rho,       x2Meas ^ 2       ]];
}

model {
  // Priors
  rho ~ normal(0.7, 0.2) T[-1, 1];
  // sigma ~ exponential(1);
  
  x_true[1] ~ normal(x1Meas, x1Error);
  x_true[2] ~ normal(x2Meas, x2Error);
   
}

generated quantities {
  vector [2] x_sim;
  x_sim = multi_normal_rng(x_true, cov);
}
1 Like

Hi @hhau,

You’ve solved my problem and thank you for your detailed and clear explanations!

Simulating from the prior confirms that I was making an incorrect assumption, and I do not need the sigma parameter.
I’ve defined the covariance matrix as you suggest in your second example (…but did you mean to use the xError terms and not the xMeas terms in cov ?)

Yes.

Nice! No worries.