 # Help setting informative prior for bivariate correlations

Hi everyone,

I’m having trouble figuring out how to “properly” set an informative prior for a regular bivariate correlation model. I have a proposed solution, but I’m not sure whether it makes sense or not, and would greatly appreciate any input.

To get going, let’s simulate 50 data points with a known correlation of 0.2:

``````library(MASS)
library(tidyverse)
library(brms)

set.seed(2021)
data <- as.data.frame(mvrnorm(50, mu = c(0,0),
Sigma = matrix(c(1, 0.2, 0.2, 1), ncol = 2),
empirical = TRUE)) %>%
mutate(x = V1, y = V2) %>%
dplyr::select(x, y)

cor(data\$x, data\$y)
``````

Ordinarily, I would approach this using `mvbind` and an `LKJ` prior:

``````m0 <- brm(data = data,
family = gaussian,
bf(mvbind(x, y) ~ 0) + set_rescor(TRUE),
prior = c(prior(normal(0, 1), class = sigma, resp = x),
prior(normal(0, 1), class = sigma, resp = y),
prior(lkj(1), class = rescor)),
sample_prior = "only",
iter = 4000, warmup = 500, chains = 4, cores = 4, seed = seed)
``````

But since the data is standardized, using a linear regression model with a suppressed intercept should yield an equivalent estimate of the correlation, right? And if so, a `uniform(-1, 1)` should be equivalent or at least very similar to the `mvbind` + `LKJ(1)` prior approach, like this:

``````m1 <- brm(data = data,
family = gaussian,
y ~ 0 + x,
prior = c(prior(uniform(-1, 1), class = b, lb = -1, ub = 1),
prior(normal(0, 1), class = sigma)),
sample_prior = "only",
iter = 4000, warmup = 500, chains = 4, cores = 4, seed = seed)
``````

In practice however I usually use the `LKJ(2)` prior to rule out extreme values:

``````m2 <- brm(data = data,
family = gaussian,
bf(mvbind(x, y) ~ 0) + set_rescor(TRUE),
prior = c(prior(normal(0, 1), class = sigma, resp = x),
prior(normal(0, 1), class = sigma, resp = y),
prior(lkj(2), class = rescor)),
sample_prior = "only",
iter = 4000, warmup = 500, chains = 4, cores = 4, seed = seed)
``````

Here the problems start, since I cannot use an `LKJ(2)` in the linear regression approach. The closest would be something like a `Beta`, but that’s bounded between 0 and 1. I experimented with different truncated normals, such as a `normal(0, 0.5)`:

``````m3 <- brm(data = data,
family = gaussian,
y ~ 0 + x,
prior = c(prior(normal(0, 0.5), class = b, lb = -1, ub = 1),
prior(normal(0, 1), class = sigma)),
sample_prior = "only",
iter = 4000, warmup = 500, chains = 4, cores = 4, seed = seed)
``````

It’s fairly close, but there are slight differences in where most of the probability mass lands. Finally, I extended this to a form of informative prior, using a `normal(0.2, 0.5)`:

``````m4 <- brm(data = data,
family = gaussian,
y ~ 0 + x,
prior = c(prior(normal(0.2, 0.5), lb = -1, ub = 1, class = b),
prior(normal(0, 1), class = sigma)),
sample_prior = "only",
iter = 4000, warmup = 500, chains = 4, cores = 4, seed = seed)
``````

Here’s how the prior distributions look: And here’s how the posterior distributions look:

Overall, very similar (at least for this simulated, well-behaved data set).

My questions boil down to:

1. Am I correct in assuming equivalence between the “regular” `mvbind(x, y)` approach and the linear regression `y ~ 0 + x` approach, as long as data are standardized?
2. If the above is correct, then is the use of truncated normals (as a substitute for the `LKJ` prior and as a way of injecting prior knowledge) reasonable?
3. Are there alternative approaches that might be better suited?
• Operating System: Debian 11
• brms Version: 2.16.1

Update after spending two days Googling (also realized this would probably be better suited under the “modeling” tag, but alas).

I found this post by Adrian Baez-Ortega in which he specifies a robust correlation model using Stan directly. He mentions that he received some ideas for improving the model from @avehtari (I’m not sure about the rules for tagging people here; please let me know if it’s inappropriate), including removing the `uniform(-1, 1)` prior on `rho`, since “it’s implicit”.

I don’t really know what that means - I’m hoping keeping the prior doesn’t break things - but I figured it might be possible to transform `rho` such that a `beta` prior on the transformed variable would work. Here’s what I came up with, based on the blog post mentioned above:

``````data {
// number of observations
int<lower=1> N;

// input data: rows must be observations, columns are x and y variables
vector x[N];
}

parameters {
// locations of the marginal t distributions
vector mu;

// scales of the marginal t distributions
real<lower=0> sigma;

// degrees of freedom of the marginal t distributions
real<lower=1> nu;

// correlation coefficient
real<lower=-1, upper=1> rho;
}

transformed parameters {
// covariance matrix
cov_matrix cov = [[sigma ^ 2,
sigma * sigma * rho],
[sigma * sigma * rho,
sigma ^ 2]];

// transform rho in order to use beta distribution as prior
real<lower=0> rho_transformed;
rho_transformed = (rho + 1) / 2;
}

model {
// likelihood: Student's t-distribution
x ~ multi_student_t(nu, mu, cov);

// weakly informative priors on mu, sigma, and beta
sigma ~ normal(0, 100);
mu ~ normal(0, 100);
nu ~ gamma(2, 0.1);
target += beta_lpdf(rho_transformed | 2, 2);
}
``````

In this example I used a `beta(2, 2)` prior on `rho_transformed`. We can compare this approach with how I would have done it using brms, with an `LKJ(2)` prior:

``````brms_robust <-
brm(data = df,
family = student,
bf(mvbind(x, y) ~ 1) + set_rescor(TRUE),
prior = c(prior(gamma(2, .1), class = nu),
prior(normal(0, 100), class = Intercept, resp = x),
prior(normal(0, 100), class = Intercept, resp = y),
prior(normal(0, 100), class = sigma, resp = x),
prior(normal(0, 100), class = sigma, resp = y),
prior(lkj(2), class = rescor)),
chains = 4, iter = 4000, warmup = 1000, cores = 1, seed = 2021, refresh = 0)
``````

Both approaches produce more or less equivalent results for simulated data, which leads me to believe that the Stan approach is reasonable (with the caveat that this is all very new to me) and that it boils down to setting a `beta` prior on `rho_transformed` that’s in line with my prior knowledge - and then my issue would be solved?

Two additional, Stan-related questions in light of this:

• When I extract posterior samples from the Stan model, the `rho` parameter is already on the correct [-1, 1] scale. Is something in the model redundant? At first, I “back-transformed” the `rho_transformed` parameter after extracting posterior samples to get it on the correct scale, but that doesn’t seem necessary.
• I specified the prior on `rho_transformed` the same way as the other parameters (i.e., using `rho_transformed ~ beta(2, 2)`, but got scary warnings about something with Jacobian determinants. Switching to `target += beta_lpdf(rho_transformed | 2, 2)` got rid of the warning, but I’m unsure whether it’s correct or if I still need to do something?

See 23.3 Changes of variables | Stan User’s Guide
Stan can’t automatically determine when Jacobian needs to be added or whether the Jacobian is constant, so that’s why it is giving the warning. Switching to += doesn’t remove the fact that you should check whether Jacobian is needed. In this case, the transformation is linear and thus the Jacobian is constant and not including it doesn’t affect the shape of the posterior.

1 Like