Help setting informative prior for bivariate correlations

Hi everyone,

I’m having trouble figuring out how to “properly” set an informative prior for a regular bivariate correlation model. I have a proposed solution, but I’m not sure whether it makes sense or not, and would greatly appreciate any input.

To get going, let’s simulate 50 data points with a known correlation of 0.2:

library(MASS)
library(tidyverse)
library(brms)

set.seed(2021)
data <- as.data.frame(mvrnorm(50, mu = c(0,0), 
                      Sigma = matrix(c(1, 0.2, 0.2, 1), ncol = 2), 
                      empirical = TRUE)) %>%
        mutate(x = V1, y = V2) %>%
        dplyr::select(x, y)

cor(data$x, data$y)     

Ordinarily, I would approach this using mvbind and an LKJ prior:

m0 <- brm(data = data, 
          family = gaussian, 
          bf(mvbind(x, y) ~ 0) + set_rescor(TRUE),
          prior = c(prior(normal(0, 1), class = sigma, resp = x),
                    prior(normal(0, 1), class = sigma, resp = y),
                    prior(lkj(1), class = rescor)),
          sample_prior = "only",
          control = list(adapt_delta = 0.9),
          iter = 4000, warmup = 500, chains = 4, cores = 4, seed = seed)

But since the data is standardized, using a linear regression model with a suppressed intercept should yield an equivalent estimate of the correlation, right? And if so, a uniform(-1, 1) should be equivalent or at least very similar to the mvbind + LKJ(1) prior approach, like this:

m1 <- brm(data = data, 
          family = gaussian,
          y ~ 0 + x,
          prior = c(prior(uniform(-1, 1), class = b, lb = -1, ub = 1),
                    prior(normal(0, 1), class = sigma)),
          sample_prior = "only",
          control = list(adapt_delta = 0.9),
          iter = 4000, warmup = 500, chains = 4, cores = 4, seed = seed)

In practice however I usually use the LKJ(2) prior to rule out extreme values:

m2 <- brm(data = data, 
          family = gaussian, 
          bf(mvbind(x, y) ~ 0) + set_rescor(TRUE),
          prior = c(prior(normal(0, 1), class = sigma, resp = x),
                    prior(normal(0, 1), class = sigma, resp = y),
                    prior(lkj(2), class = rescor)),
          sample_prior = "only",
          control = list(adapt_delta = 0.9),
          iter = 4000, warmup = 500, chains = 4, cores = 4, seed = seed)

Here the problems start, since I cannot use an LKJ(2) in the linear regression approach. The closest would be something like a Beta, but that’s bounded between 0 and 1. I experimented with different truncated normals, such as a normal(0, 0.5):

m3 <- brm(data = data, 
          family = gaussian,
          y ~ 0 + x,
          prior = c(prior(normal(0, 0.5), class = b, lb = -1, ub = 1),
                    prior(normal(0, 1), class = sigma)),
          sample_prior = "only",
          control = list(adapt_delta = 0.9),
          iter = 4000, warmup = 500, chains = 4, cores = 4, seed = seed)

It’s fairly close, but there are slight differences in where most of the probability mass lands. Finally, I extended this to a form of informative prior, using a normal(0.2, 0.5):

m4 <- brm(data = data, 
          family = gaussian,
          y ~ 0 + x,
          prior = c(prior(normal(0.2, 0.5), lb = -1, ub = 1, class = b),
                    prior(normal(0, 1), class = sigma)),
          sample_prior = "only",
          control = list(adapt_delta = 0.9),
          iter = 4000, warmup = 500, chains = 4, cores = 4, seed = seed)

Here’s how the prior distributions look:

prior_plots

And here’s how the posterior distributions look:

Overall, very similar (at least for this simulated, well-behaved data set).

My questions boil down to:

  1. Am I correct in assuming equivalence between the “regular” mvbind(x, y) approach and the linear regression y ~ 0 + x approach, as long as data are standardized?
  2. If the above is correct, then is the use of truncated normals (as a substitute for the LKJ prior and as a way of injecting prior knowledge) reasonable?
  3. Are there alternative approaches that might be better suited?
  • Operating System: Debian 11
  • brms Version: 2.16.1

Update after spending two days Googling (also realized this would probably be better suited under the “modeling” tag, but alas).

I found this post by Adrian Baez-Ortega in which he specifies a robust correlation model using Stan directly. He mentions that he received some ideas for improving the model from @avehtari (I’m not sure about the rules for tagging people here; please let me know if it’s inappropriate), including removing the uniform(-1, 1) prior on rho, since “it’s implicit”.

I don’t really know what that means - I’m hoping keeping the prior doesn’t break things - but I figured it might be possible to transform rho such that a beta prior on the transformed variable would work. Here’s what I came up with, based on the blog post mentioned above:

data {
  // number of observations
  int<lower=1> N; 
  
  // input data: rows must be observations, columns are x and y variables
  vector[2] x[N]; 
}

parameters {
  // locations of the marginal t distributions
  vector[2] mu;
  
  // scales of the marginal t distributions
  real<lower=0> sigma[2];
  
  // degrees of freedom of the marginal t distributions
  real<lower=1> nu;
  
  // correlation coefficient
  real<lower=-1, upper=1> rho;
}


transformed parameters {
  // covariance matrix
  cov_matrix[2] cov = [[sigma[1] ^ 2,
                        sigma[1] * sigma[2] * rho],
                       [sigma[1] * sigma[2] * rho, 
                        sigma[2] ^ 2]];
  
  // transform rho in order to use beta distribution as prior
  real<lower=0> rho_transformed;
  rho_transformed = (rho + 1) / 2;
}

model {
  // likelihood: Student's t-distribution
  x ~ multi_student_t(nu, mu, cov);
    
  // weakly informative priors on mu, sigma, and beta
  sigma ~ normal(0, 100);
  mu ~ normal(0, 100);
  nu ~ gamma(2, 0.1);
  target += beta_lpdf(rho_transformed | 2, 2);
}

In this example I used a beta(2, 2) prior on rho_transformed. We can compare this approach with how I would have done it using brms, with an LKJ(2) prior:

brms_robust <- 
  brm(data = df, 
      family = student,
      bf(mvbind(x, y) ~ 1) + set_rescor(TRUE),
      prior = c(prior(gamma(2, .1), class = nu),
                prior(normal(0, 100), class = Intercept, resp = x),
                prior(normal(0, 100), class = Intercept, resp = y),
                prior(normal(0, 100), class = sigma, resp = x),
                prior(normal(0, 100), class = sigma, resp = y),
                prior(lkj(2), class = rescor)),
      chains = 4, iter = 4000, warmup = 1000, cores = 1, seed = 2021, refresh = 0)

Both approaches produce more or less equivalent results for simulated data, which leads me to believe that the Stan approach is reasonable (with the caveat that this is all very new to me) and that it boils down to setting a beta prior on rho_transformed that’s in line with my prior knowledge - and then my issue would be solved?

Two additional, Stan-related questions in light of this:

  • When I extract posterior samples from the Stan model, the rho parameter is already on the correct [-1, 1] scale. Is something in the model redundant? At first, I “back-transformed” the rho_transformed parameter after extracting posterior samples to get it on the correct scale, but that doesn’t seem necessary.
  • I specified the prior on rho_transformed the same way as the other parameters (i.e., using rho_transformed ~ beta(2, 2), but got scary warnings about something with Jacobian determinants. Switching to target += beta_lpdf(rho_transformed | 2, 2) got rid of the warning, but I’m unsure whether it’s correct or if I still need to do something?

See 23.3 Changes of variables | Stan User’s Guide
Stan can’t automatically determine when Jacobian needs to be added or whether the Jacobian is constant, so that’s why it is giving the warning. Switching to += doesn’t remove the fact that you should check whether Jacobian is needed. In this case, the transformation is linear and thus the Jacobian is constant and not including it doesn’t affect the shape of the posterior.

1 Like